General Social Survey Week 5 Project of the course Inferential Statistics under the course track Statistics with R

Submitted by Olusola Afuwape April 17th 2019

Setup

Load packages

library(ggplot2)
library(dplyr)
library(statsr)

Load data

load("gss.Rdata")

Part 1: Data

Overview

The General Social Survey (GSS) monitors changes in social trends and norms in the American society. The study has been involved in the collection of relevant data in the United States since 1972. The GSS questionnaire is diverse and its scope covers myriad subjects including economic and workplace issues like income levels, standard of living, health and family concerns like divorce laws, use of contraception. Other areas include obligations and responsibilities of government, controversial social issues, personal concerns, national problems. Please visit GSS General Social Survey|NORC for more.

Scope of Inference

Scope of inference covers spatial, temporal and ecological/socio-economic extents over which the study is applied. Spatial context examines the trends in the United States and compares these with other national models of human society. From the temporal viewpoint, the study started in 1972 and contemporary data are gathered on annual basis. Ecological/socio-economic aspect deals with the socio-economic trends and effects of these on the society.

GSS is an observational study and can only imply correlation. Generally, observational studies are prone to sampling bias like non-response bias because GSS is a survey study. GSS can imply generalizability but it cannot establish causality.

Part 2: Research question

Suicide is a leading cause of death in the United States. How do some variables in the GSS data are related to suicide rates in the United States?

Suicide is a leading cause of death in the United States. Suicide rates increased in nearly every state from 1999 through 2016. Mental health conditions are often seen as the cause of suicide, but suicide is rarely caused by any single factor. In fact, many people who died by suicide are not known to have a diagnosed mental health condition at the time of death. Other problems often contribute to suicide such as those related to relationships, substance use, physical health, job, money, legal, housing stress etc. See Suicide rising across the US


Part 3: Exploratory data analysis

# Check the data dimension
dim(gss)
[1] 57061   114
# Select the suicide variables
gss_suicide <- gss %>% filter(!is.na(suicide1), !is.na(suicide2), !is.na(suicide3), !is.na(suicide4)) %>% select(suicide1, suicide2, suicide3, suicide4)

head(gss_suicide)
  suicide1 suicide2 suicide3 suicide4
1       No       No       No       No
2      Yes      Yes      Yes      Yes
3       No       No       No       No
4      Yes       No       No       No
5      Yes       No       No       No
6      Yes       No       No       No
# Plot the suicide variables
plot(gss_suicide$suicide1, xlab = "Suicide status", ylab = "Suicide level", col = "green", main = "Suicide due to incurable disease")

plot(gss_suicide$suicide2, xlab = "Suicide status", ylab = "Suicide level", col = "beige", main = "Suicide due to bankrupcy")

plot(gss_suicide$suicide3, xlab = "Suicide status", ylab = "Suicide level", col = "yellow", main = "Suicide due to dishonored family")

plot(gss_suicide$suicide4, xlab = "Suicide status", ylab = "Suicide level", col = "light blue", main = "Suicide due to 'tired of living'")

# Tabulate suicide1 variable

table(gss_suicide$suicide1)

  Yes    No 
15191 12813 

Discussion

Comparing the plots for the various causes of suicide, suicide due to incurable disease shows the highest level of Yes. This high level of Yes illustrates the belief by many to end their lives when faced by incurable disease. According to an article, 10% of suicide cases result from chronic or terminal illness. It was stated that “lack of attention paid to people with terminal or chronic illness committing suicide is a gross dereliction of duty on the part of government and health services.” See One in 10 suicides linked to chronic illness

Thus, this research question will be comparing how the following GSS variables are related:

  1. suicide1 - a categorical variable on suicide if incurable disease
  2. natheal - a categorical variable on improving and protecting nation’s health
  3. conmedic - a categorical variable on confidence in medicine
# Get and observe the required variables

gss_suicide1 <- gss %>% filter(!is.na(suicide1), !is.na(natheal), !is.na(conmedic)) %>%
        select(suicide1, natheal, conmedic)

dim(gss_suicide1)
[1] 11364     3
head(gss_suicide1)
  suicide1     natheal     conmedic
1       No    Too Much A Great Deal
2      Yes  Too Little A Great Deal
3       No About Right    Only Some
4      Yes  Too Little A Great Deal
5      Yes About Right    Only Some
6      Yes  Too Little A Great Deal
table(gss_suicide1$natheal)

 Too Little About Right    Too Much 
       7281        3331         752 
table(gss_suicide1$conmedic)

A Great Deal    Only Some   Hardly Any 
        5160         5198         1006 

More of the respondents believed that too little is being done in improving and protecting nation’s health. Likewise, more of the respondents only have some level confidence in medicine when compared to those that have great deal of confidence.

# Plot variables suicide1, natheal and conmedic

nat_con <- ggplot(gss_suicide1) + aes(x = natheal, fill = conmedic) + geom_bar(position = "dodge")
nat_con <- nat_con + xlab("Nation's health") + ylab("Count") + scale_fill_discrete(name = "Confidence in medicine")
nat_con

s_natheal <- ggplot(gss_suicide1) + aes(x = natheal, fill = suicide1) + geom_bar(position = "dodge")
s_natheal <- s_natheal + xlab("Nation's health") + ylab("Count") + scale_fill_discrete(name = "Suicide due to incurable disease")
s_natheal

s_conmedic <- ggplot(gss_suicide1) + aes(x = conmedic, fill = suicide1) + geom_bar(position = "dodge")
s_conmedic <- s_conmedic + xlab("Confidence in medicine") + ylab("Count") + scale_fill_discrete(name = "Suicide due to incurable disease")
s_conmedic

chisq.test(gss_suicide1$natheal, gss_suicide1$conmedic)$expected
                    gss_suicide1$conmedic
gss_suicide1$natheal A Great Deal Only Some Hardly Any
         Too Little     3306.0507 3330.3976  644.55174
         About Right    1512.4921 1523.6306  294.87733
         Too Much        341.4572  343.9718   66.57093

Part 4: Inference

This analysis will compare the relationship between the variables suicide1 and natheal on one side. Then, the relationship between the variables suicide1 and conmedic.

Chi-square test of independence will be used for the two analyses.

Hypotheses

  1. Null Hypothesis (Ho): Suicide due to incurable disease is independent of nation’s health. Suicide due to incurable disease is also independent of confidence in medicine.
  2. Alternative Hypothesis (HA): Suicide due to incurable disease is dependent on natheal. Also, suicide due to incurable disease is dependent on confidence in medicine.

Conditions for hypothesis

  1. Independence: General Social Survey (GSS) data employed simple random sampling.
  2. Variables: The variables under consideration are categorical variables. Chi-square goodness of fit test is used to analyze categorical variables.
  3. Expected values: The expected value of the number of sample observation in each level of the variable is at least 5.
  4. Sample size: The sample size of the data is less than 10% of the United States population.
  5. Degree of freedom (DF): The degree of freedom is equal to the number of levels (k) of each categorical variable minus 1. Thus, suicide1 has 1 DF, natheal has 2 and conmedic has 2.

Chi-square tests of independence

chisq.test(gss_suicide1$suicide1, gss_suicide1$natheal)

    Pearson's Chi-squared test

data:  gss_suicide1$suicide1 and gss_suicide1$natheal
X-squared = 33.151, df = 2, p-value = 6.33e-08
chisq.test(gss_suicide1$suicide1, gss_suicide1$conmedic)

    Pearson's Chi-squared test

data:  gss_suicide1$suicide1 and gss_suicide1$conmedic
X-squared = 6.9492, df = 2, p-value = 0.03097
chisq.test(gss_suicide1$natheal, gss_suicide1$conmedic)

    Pearson's Chi-squared test

data:  gss_suicide1$natheal and gss_suicide1$conmedic
X-squared = 159.72, df = 4, p-value < 2.2e-16

Results

Chi-square test of independence between variables suicide1 and natheal is 33.151 with a very low p-value (very much lower than 0.05). Likewise, chi-square test of independence between variables suicide1 and conmedic is 6.9492 and a low p-value (lower than 0.05).

Conclusion

From the results, there is convincing proof to reject the null hypothesis and accept the alternative hypothesis that suicide due to incurable disease and nation’s health are dependent. Also, the results provide convincing proof to reject the null hypothesis and accept the alternative hypothesis that suicide due to incurable disease and confidence in medicine are dependent.