General Social Survey Week 5 Project of the course Inferential Statistics under the course track Statistics with R
Submitted by Olusola Afuwape April 17th 2019
library(ggplot2)
library(dplyr)
library(statsr)load("gss.Rdata")The General Social Survey (GSS) monitors changes in social trends and norms in the American society. The study has been involved in the collection of relevant data in the United States since 1972. The GSS questionnaire is diverse and its scope covers myriad subjects including economic and workplace issues like income levels, standard of living, health and family concerns like divorce laws, use of contraception. Other areas include obligations and responsibilities of government, controversial social issues, personal concerns, national problems. Please visit GSS General Social Survey|NORC for more.
Scope of inference covers spatial, temporal and ecological/socio-economic extents over which the study is applied. Spatial context examines the trends in the United States and compares these with other national models of human society. From the temporal viewpoint, the study started in 1972 and contemporary data are gathered on annual basis. Ecological/socio-economic aspect deals with the socio-economic trends and effects of these on the society.
GSS is an observational study and can only imply correlation. Generally, observational studies are prone to sampling bias like non-response bias because GSS is a survey study. GSS can imply generalizability but it cannot establish causality.
Suicide is a leading cause of death in the United States. How do some variables in the GSS data are related to suicide rates in the United States?
Suicide is a leading cause of death in the United States. Suicide rates increased in nearly every state from 1999 through 2016. Mental health conditions are often seen as the cause of suicide, but suicide is rarely caused by any single factor. In fact, many people who died by suicide are not known to have a diagnosed mental health condition at the time of death. Other problems often contribute to suicide such as those related to relationships, substance use, physical health, job, money, legal, housing stress etc. See Suicide rising across the US
# Check the data dimension
dim(gss)[1] 57061 114
# Select the suicide variables
gss_suicide <- gss %>% filter(!is.na(suicide1), !is.na(suicide2), !is.na(suicide3), !is.na(suicide4)) %>% select(suicide1, suicide2, suicide3, suicide4)
head(gss_suicide) suicide1 suicide2 suicide3 suicide4
1 No No No No
2 Yes Yes Yes Yes
3 No No No No
4 Yes No No No
5 Yes No No No
6 Yes No No No
# Plot the suicide variables
plot(gss_suicide$suicide1, xlab = "Suicide status", ylab = "Suicide level", col = "green", main = "Suicide due to incurable disease")plot(gss_suicide$suicide2, xlab = "Suicide status", ylab = "Suicide level", col = "beige", main = "Suicide due to bankrupcy")plot(gss_suicide$suicide3, xlab = "Suicide status", ylab = "Suicide level", col = "yellow", main = "Suicide due to dishonored family")plot(gss_suicide$suicide4, xlab = "Suicide status", ylab = "Suicide level", col = "light blue", main = "Suicide due to 'tired of living'")# Tabulate suicide1 variable
table(gss_suicide$suicide1)
Yes No
15191 12813
Comparing the plots for the various causes of suicide, suicide due to incurable disease shows the highest level of Yes. This high level of Yes illustrates the belief by many to end their lives when faced by incurable disease. According to an article, 10% of suicide cases result from chronic or terminal illness. It was stated that “lack of attention paid to people with terminal or chronic illness committing suicide is a gross dereliction of duty on the part of government and health services.” See One in 10 suicides linked to chronic illness
Thus, this research question will be comparing how the following GSS variables are related:
# Get and observe the required variables
gss_suicide1 <- gss %>% filter(!is.na(suicide1), !is.na(natheal), !is.na(conmedic)) %>%
select(suicide1, natheal, conmedic)
dim(gss_suicide1)[1] 11364 3
head(gss_suicide1) suicide1 natheal conmedic
1 No Too Much A Great Deal
2 Yes Too Little A Great Deal
3 No About Right Only Some
4 Yes Too Little A Great Deal
5 Yes About Right Only Some
6 Yes Too Little A Great Deal
table(gss_suicide1$natheal)
Too Little About Right Too Much
7281 3331 752
table(gss_suicide1$conmedic)
A Great Deal Only Some Hardly Any
5160 5198 1006
More of the respondents believed that too little is being done in improving and protecting nation’s health. Likewise, more of the respondents only have some level confidence in medicine when compared to those that have great deal of confidence.
# Plot variables suicide1, natheal and conmedic
nat_con <- ggplot(gss_suicide1) + aes(x = natheal, fill = conmedic) + geom_bar(position = "dodge")
nat_con <- nat_con + xlab("Nation's health") + ylab("Count") + scale_fill_discrete(name = "Confidence in medicine")
nat_cons_natheal <- ggplot(gss_suicide1) + aes(x = natheal, fill = suicide1) + geom_bar(position = "dodge")
s_natheal <- s_natheal + xlab("Nation's health") + ylab("Count") + scale_fill_discrete(name = "Suicide due to incurable disease")
s_natheals_conmedic <- ggplot(gss_suicide1) + aes(x = conmedic, fill = suicide1) + geom_bar(position = "dodge")
s_conmedic <- s_conmedic + xlab("Confidence in medicine") + ylab("Count") + scale_fill_discrete(name = "Suicide due to incurable disease")
s_conmedicchisq.test(gss_suicide1$natheal, gss_suicide1$conmedic)$expected gss_suicide1$conmedic
gss_suicide1$natheal A Great Deal Only Some Hardly Any
Too Little 3306.0507 3330.3976 644.55174
About Right 1512.4921 1523.6306 294.87733
Too Much 341.4572 343.9718 66.57093
This analysis will compare the relationship between the variables suicide1 and natheal on one side. Then, the relationship between the variables suicide1 and conmedic.
Chi-square test of independence will be used for the two analyses.
chisq.test(gss_suicide1$suicide1, gss_suicide1$natheal)
Pearson's Chi-squared test
data: gss_suicide1$suicide1 and gss_suicide1$natheal
X-squared = 33.151, df = 2, p-value = 6.33e-08
chisq.test(gss_suicide1$suicide1, gss_suicide1$conmedic)
Pearson's Chi-squared test
data: gss_suicide1$suicide1 and gss_suicide1$conmedic
X-squared = 6.9492, df = 2, p-value = 0.03097
chisq.test(gss_suicide1$natheal, gss_suicide1$conmedic)
Pearson's Chi-squared test
data: gss_suicide1$natheal and gss_suicide1$conmedic
X-squared = 159.72, df = 4, p-value < 2.2e-16
Chi-square test of independence between variables suicide1 and natheal is 33.151 with a very low p-value (very much lower than 0.05). Likewise, chi-square test of independence between variables suicide1 and conmedic is 6.9492 and a low p-value (lower than 0.05).
From the results, there is convincing proof to reject the null hypothesis and accept the alternative hypothesis that suicide due to incurable disease and nation’s health are dependent. Also, the results provide convincing proof to reject the null hypothesis and accept the alternative hypothesis that suicide due to incurable disease and confidence in medicine are dependent.