Since 1972, the General Social Survey (GSS) has been monitoring societal change and studying the growing complexity of American society.
## [1] 57061 114
The General Social Survey (GSS) is a sociological survey used to collect information and keep a historical record of the concerns, experiences, attitudes, and practices of residents of the United States. GSS questions cover a diverse range of issues including national spending priorities, marijuana use, crime and punishment, race relations, quality of life, confidence in institutions, and sexual behavior. The data-set used for this project is an extract of the General Social Survey (GSS) Cumulative File 1972-2012. It consists of 57061 observations with 114 variables. Each variable corresponds to a specific question asked to the respondent.
According to Wikipedia, The GSS survey is conducted face-to-face with an in-person interview by NORC at the University of Chicago. The target population is adults (18+) living in households in the United States. Respondents are random sampled from a mix of urban, suburban, and rural geographic areas. Participation in the study is strictly voluntary.
The sample data should allow us to generalize to the population of interest. It is a survey of 57061 U.S adults aged 18 years or older. The survey is based on random sampling. However, there is no causation can be established as GSS is an observation study that can only establish correlation/association between variables. In addition, potential biases are associated with non-response because this is a voluntary in-person survey that takes approximately 90 minutes. Some potential respondents may choose not to participate.
Do individuals who identify as atheist or non-religious express significantly different views on Controversial Social Issues and economic matters than people of religious persuasion? This research question touches on a larger question of the relationship between rationalism and individualism. Does the absence of an after-life and ultimate judge leave atheists feeling less concerned with social welfare?
Looking at the major religions recorded in the survey
## # A tibble: 14 x 2
## relig n
## <fct> <int>
## 1 Protestant 916
## 2 Catholic 444
## 3 Jewish 28
## 4 None 387
## 5 Other 25
## 6 Buddhism 6
## 7 Hinduism 9
## 8 Other Eastern 5
## 9 Moslem/Islam 13
## 10 Orthodox-Christian 6
## 11 Christian 120
## 12 Native American 6
## 13 Inter-Nondenominational 2
## 14 <NA> 7
Selecting the attributes that are of interest for the study:
target_data<-select(gss,c('year','coninc','natsoc','relig','attend','homosex','abany'),)
#We wish to see the recent trends so considering only year 2012
gss_2012<-target_data%>%filter(year==2012)%>%
mutate(is_religious=factor(ifelse(relig=="None",'no','yes')))
prop.table(table(gss_2012$is_religious))
##
## no yes
## 0.1967463 0.8032537
Approximately 20% of the respondents in the year 2012 identified themselves as non-believers or atheists while the remaining 80% of the respondents identified themselves to be part of some major religion
ggplot(gss_2012,aes(x=attend, fill =is_religious))+
ggtitle("Respondent service attendence in 2012 survey") +
xlab("") + ylab("Respondents") +
labs(fill = "Religious") +
geom_bar(position = position_dodge()) +
scale_x_discrete(labels = function(attend) lapply(strwrap(attend, width = 10,
simplify = FALSE), paste, collapse="\n"))
We can notice from the above plot that being religious and attending the service are not the same. There are respondents who are religious but do not attend religious services and many non religious respondents who attend the religious services.
The GSS variable and survey questions along with the tabulated results are:
## is_religious
## homosex no yes
## Always Wrong 42 523
## Almst Always Wrg 4 33
## Sometimes Wrong 18 64
## Not Wrong At All 161 392
## Other 0 0
## is_religious
## abany no yes
## Yes 153 401
## No 75 617
gss_2012 %>%
filter(!is.na(coninc), !is.na(is_religious)) %>%
ggplot(aes(x = is_religious, y = coninc)) +
ggtitle("Family Income in 2012 survey") +
xlab("Religious") + ylab("Constant Dollars") +
geom_boxplot()
However, it may provide some insight into the relationship between religion and wealth (although we cannot conclude anything about causality). The box plot below suggests both groups have about equal IQRs, but that non-religious persons may have slightly higher median incomes
Again, the question posed to respondents was: What about sexual relations between two adults of the same sex?
A bar chart reveals that non-religious individuals appear to have no issues with homosexuality. Religious individuals, on the other hand, often are strongly opposed.
gss_homosex<-gss_2012%>%select(c("homosex","is_religious"),)%>%
filter(!is.na(homosex),!is.na(is_religious))
ggplot(gss_homosex,aes(x=homosex, fill =is_religious))+
ggtitle("Response on Homosexual relationships in 2012 survey") +
xlab("") + ylab("Respondents") +
labs(fill = "Religious") +
geom_bar(position = position_dodge()) +
scale_x_discrete(labels = function(homosex) lapply(strwrap(homosex, width = 10,
simplify = FALSE), paste, collapse="\n"))
A hypothesis test using the two-sample proportion test. This test applies if three conditions are met: - Independence within groups: this is a random sample from <10% of population. - Independence between groups: this is unpaired data. - Sample size/ skew: at least 10 successes and failures from both the religious and non-religious groups.
Our null hypothesis is H0: the proportions of respondents who believe homosexual relations are not wrong at all is the same for religious and non-religious respondents. The alternative hypothesis is HA: that the proportions differ.
gss_homosex <- gss_homosex %>% mutate(response = ifelse(homosex == "Not Wrong At All", "Not Wrong at all", "Other"))
prop.test(table(gss_homosex$is_religious, gss_homosex$response), correct = FALSE)
##
## 2-sample test for equality of proportions without continuity
## correction
##
## data: table(gss_homosex$is_religious, gss_homosex$response)
## X-squared = 80.212, df = 1, p-value < 2.2e-16
## alternative hypothesis: two.sided
## 95 percent confidence interval:
## 0.2620539 0.3943536
## sample estimates:
## prop 1 prop 2
## 0.7155556 0.3873518
Again, the question posed to respondents was: Please tell me whether or not you think it should be possible for a pregnant woman to obtain a legal abortion if: g. The woman wants it for any reason?
gss_abortion<-gss_2012%>%select(c("abany","is_religious"),)%>%
filter(!is.na(abany),!is.na(is_religious))
ggplot(gss_abortion,aes(x=abany, fill =is_religious))+
ggtitle("Response on Abortion in 2012 survey") +
xlab("") + ylab("Respondents") +
labs(fill = "Religious") +
geom_bar(position = position_dodge()) +
scale_x_discrete(labels = function(abany) lapply(strwrap(abany, width = 10,
simplify = FALSE), paste, collapse="\n"))
A hypothesis test using the two-sample proportion test. This test applies if three conditions are met: - Independence within groups: this is a random sample from <10% of population. - Independence between groups: this is unpaired data. - Sample size/ skew: at least 10 successes and failures from both the religious and non-religious groups.
Our null hypothesis is H0: the proportions of respondents who believe abortions for any reason are not wrong at all is the same for religious and non-religious respondents. The alternative hypothesis is HA: that the proportions differ.
##
## 2-sample test for equality of proportions without continuity
## correction
##
## data: table(gss_abortion$is_religious, gss_abortion$abany)
## X-squared = 57.942, df = 1, p-value = 2.7e-14
## alternative hypothesis: two.sided
## 95 percent confidence interval:
## 0.2091719 0.3451141
## sample estimates:
## prop 1 prop 2
## 0.6710526 0.3939096
This continuous numeric variable is the total family income and is therefore only an indirect measure of the income of the respondent.
Perform a one-sample t-test to estimate a confidence interval of family income for religious individuals and non-religious individuals.
First check conditions for the sampling distribution of x:
gss_coninc_2012 <- gss_2012%>% filter(!is.na(coninc), !is.na(is_religious))
qqnorm(gss_coninc_2012$coninc)
qqline(gss_coninc_2012$coninc)
It appears that our sample violates the normality condition because it skews significantly to the right. However, we will continue with the analysis.
Construct a 95% confidence interval for family income for religious and non-religious individuals.
L = gss_2012$is_religious == "yes"
inc.relig = gss_2012[L,]$coninc
inc.nonrelig = gss_2012[!L,]$coninc
t.test(inc.relig)
##
## One Sample t-test
##
## data: inc.relig
## t = 38.893, df = 1404, p-value < 2.2e-16
## alternative hypothesis: true mean is not equal to 0
## 95 percent confidence interval:
## 44773.52 49529.88
## sample estimates:
## mean of x
## 47151.7
##
## One Sample t-test
##
## data: inc.nonrelig
## t = 19.446, df = 347, p-value < 2.2e-16
## alternative hypothesis: true mean is not equal to 0
## 95 percent confidence interval:
## 47918.84 58703.08
## sample estimates:
## mean of x
## 53310.96
The 95% confidence interval of family income for religious individuals is $47,151.70 +/- $2,378.18.
The 95% confidence interval of family income for non-religious individuals is $53,310.96 +/- $5,392.12.
Are these family incomes significantly different? Perform a two-sample mean comparison test.
##
## Welch Two Sample t-test
##
## data: inc.relig and inc.nonrelig
## t = -2.0547, df = 491.34, p-value = 0.04044
## alternative hypothesis: true difference in means is not equal to 0
## 95 percent confidence interval:
## -12049.0119 -269.5026
## sample estimates:
## mean of x mean of y
## 47151.70 53310.96