library(ggplot2)
library(dplyr)
library(statsr)Make sure your data and R Markdown files are in the same directory. When loaded your data file will be called gss. Delete this note when before you submit your work.
load("gss.Rdata")dim(gss)[1]## [1] 57061
variable.names(gss)## [1] "caseid" "year" "age" "sex" "race" "hispanic"
## [7] "uscitzn" "educ" "paeduc" "maeduc" "speduc" "degree"
## [13] "vetyears" "sei" "wrkstat" "wrkslf" "marital" "spwrksta"
## [19] "sibs" "childs" "agekdbrn" "incom16" "born" "parborn"
## [25] "granborn" "income06" "coninc" "region" "partyid" "polviews"
## [31] "relig" "attend" "natspac" "natenvir" "natheal" "natcity"
## [37] "natcrime" "natdrug" "nateduc" "natrace" "natarms" "nataid"
## [43] "natfare" "natroad" "natsoc" "natmass" "natpark" "confinan"
## [49] "conbus" "conclerg" "coneduc" "confed" "conlabor" "conpress"
## [55] "conmedic" "contv" "conjudge" "consci" "conlegis" "conarmy"
## [61] "joblose" "jobfind" "satjob" "richwork" "jobinc" "jobsec"
## [67] "jobhour" "jobpromo" "jobmeans" "class" "rank" "satfin"
## [73] "finalter" "finrela" "unemp" "govaid" "getaid" "union"
## [79] "getahead" "parsol" "kidssol" "abdefect" "abnomore" "abhlth"
## [85] "abpoor" "abrape" "absingle" "abany" "pillok" "sexeduc"
## [91] "divlaw" "premarsx" "teensex" "xmarsex" "homosex" "suicide1"
## [97] "suicide2" "suicide3" "suicide4" "fear" "owngun" "pistol"
## [103] "shotgun" "rifle" "news" "tvhours" "racdif1" "racdif2"
## [109] "racdif3" "racdif4" "helppoor" "helpnot" "helpsick" "helpblk"
The General Social Survey (GSS) is a survey designed to collect informatio to monitoe and explain trends in attitudes, behavior and attibutes of resident of the United States.
The marjority of data was collected by randomly secleted from spanish speaking person 18 years og age or older who lives in the United States and wiiling to take a questionare by computer- assisted personal interview (CAPI), face-to-face interview, telephone interview since 1972.
In fact, this interview is voluntary- based so that may imply the sample was biasd toward people who were more willing or able to complete the survey. Howerver, this survey contains 57061 respondents and more than 100 variables have been collected, the data is belived to be generalization of the whole United Stateds’ residents.
Since tha data is obtained from observation, we cannot make causal conclusions.
Does people have opinion of should government improve the standard of living of living of all poor Americans depended on their party affliations?
First I have to sorted out what I data I should use, in this case is and and create a table for numerical summary.
table <- table(gss$partyid,gss$helppoor)
table##
## Govt Action Agree With Both People Help Selves
## Strong Democrat 1364 1758 329
## Not Str Democrat 1079 2609 429
## Ind,Near Dem 608 1600 230
## Independent 777 1956 404
## Ind,Near Rep 252 1085 368
## Not Str Republican 423 2039 602
## Strong Republican 217 1011 660
## Other Party 67 153 55
As we can tell from the table, there are more democrats think government should help poor Americans improve their standards of living, but the results did not show us the proportion of people in different party think of imroving poor Americans standard of living. Next, we should conduct a graph shows distribution of opinions towards improving poor Americans’ standards of living, and construct a graph shows the proportion of people in each party think of improving poor Americans’ standards of living.
ggplot(gss %>% filter(helppoor!='NA') %>% filter(partyid!='Other Party'), aes(partyid, fill=helppoor)) +
geom_bar(position="dodge") +
theme(axis.text.x = element_text(angle = 90, hjust = 1))mosaicplot(table, shade = TRUE) Those grapghs shows us the basic idea of distribution and proportion of thoughts, but we want to get a clear view of data, so I want to group some views and some party affliation.
partyid2 <-recode(gss$partyid,'Strong Democrat' = "Democrat", 'Not Str Democrat' = "Democrat", 'Ind,Near Dem' = "Democrat", 'Ind,Near Rep' = "Republican", 'Not Str Republican' = "Republican", 'Strong Republican' = 'Republican')
helppoor2 <- recode(gss$helppoor, 'Govt Action' = "Yes", 'Agree With Both' = "No", 'People Help Selves' = "No")
newtable <- table(partyid2, helppoor2)
newtable## helppoor2
## partyid2 Yes No
## Democrat 3051 6955
## Independent 777 2360
## Republican 892 5765
## Other Party 67 208
ggplot(gss , aes(partyid2, fill = helppoor2)) +
geom_bar(position = "dodge")mosaicplot(newtable, shade = TRUE) We can see the proportion of Democrat think government should improve poor Americans’s standards of living is more that the proportion of Republican opinon of should improve porr Americans’s standards of living. What happen if we put Agree with both on the “yes Column”
helppoor3 <- recode(gss$helppoor, 'Govt Action' = "Yes", 'Agree With Both' = "Yes", 'People Help Selves' = "No")
newtable2 <- table(partyid2, helppoor3)
newtable2## helppoor3
## partyid2 Yes No
## Democrat 9018 988
## Independent 2733 404
## Republican 5027 1630
## Other Party 220 55
ggplot(gss , aes(partyid2, fill = helppoor3)) +
geom_bar(position = "dodge")mosaicplot(newtable2, shade = TRUE) The results show us the same when we put “Agree with Both” to “yes, government should improve poor Americans’s standards of living”
Now we test whether there is relationship between opinion on improve poor Americans’s standards of living and people party affiliations. \(H_0\) : (Nothing going on) In population, opinion of people on improve poor Americans’s standards of living are independent. \(H_A\) : (Something going on) In population, opinion of people on improve poor Americans’s standards of living are dependent.
Random sample select has been used in this survey. sample size is smaller than 10% of the whole United States population. Each respondents’ answer correspond to on cell in the table.
chisq.test(newtable)##
## Pearson's Chi-squared test
##
## data: newtable
## X-squared = 644.92, df = 3, p-value < 2.2e-16
chisq.test(newtable2)##
## Pearson's Chi-squared test
##
## data: newtable2
## X-squared = 678.61, df = 3, p-value < 2.2e-16
Both results have p-value near zero, so we can say opinion of people on improve poor Americans’s standards of living are dependent.