Setup

Load packages

library(ggplot2)
library(dplyr)
library(tidyr)
library(statsr)

Load data

Make sure your data and R Markdown files are in the same directory. When loaded your data file will be called gss. Delete this note when before you submit your work.

gss = readRDS("gss.rds")

Part 1: Data

The General Social Survey (GSS) is a sociological survey created and regularly collected since 1972. The target population of the GSS is adults living in households in the United States. The GSS sample is drawn using an area probability design that randomly selects respondents in households across the United States to take part in the survey. Random selection allows causal relationship to be drawn between variables. Respondents that become part of the GSS sample are from a mix of urban, suburban, and rural geographic areas. Therefore the survey is representitive for Americans living across the country. Participation in the study is strictly voluntary. Hence the study might bias against population which have incentives to not participate.


Part 2: Research question

The survey measures opinion on four major reasons why people commit suicide. Are the proportions for supporting each reason for suicide different from each other?

H0: Between 2008 and 2012, opinion towards suicide is independent of suicidal motives.

Ha: Between 2008 and 2012, opinion towards suicide is dependent of suicidal motives.

This research question might be of interest to social scientists who are interested in public opinion towards suicide.


Part 3: Exploratory data analysis

##                    opinion
## reason                No  Yes
##   INCURABLE DISEASE 1551 2237
##   BANKRUPT          3384  404
##   DISHONORED FAMILY 3381  407
##   TIRED OF LIVING   3095  693

There are substential people opposing suicide regardless of reasons. However, people seem to be more negotiable when the reason is “INCURABLE DISEASE”.


Part 4: Inference

Chi-square test of Independence

We have two variables:

  1. Opinion

    * Yes
    * No
  2. Suicidal motives

    * INCURABLE DISEASE 
    * BANKRUPT    
    * DISHONORED FAMILY
    * TIRED OF LIVING 

We use Chi-square test of independence because we want to evaluate the relationship between proportions of two categorical variables, when at least one of these variables having more than two levels.

Conditions:

  • the observations should be independent
  • expected counts for each cell should be at least 5
  • degrees of freedom should be at least 2 (if not, use methods for evaluating proportions)

The above conditions are all met. Hence we use a theoretical approach.

chisq.test(table(suicide)) 
## 
##  Pearson's Chi-squared test
## 
## data:  table(suicide)
## X-squared = 3286.1, df = 3, p-value < 2.2e-16

A Chi-square Independence test is used to test whether the four categories of suicide reason recieve the same proportion of support/opposition. Base on this p-value, we reject the null hypothesis; we conclude that there is an association between opinion towards suicide and suicidal motives.

Only one test is conducted because there are no other test available in testing proportion of two categorical variables.

We learnt that while most people oppose suicide, reason for suicide seem to play a role in influencing opinion towards suicide.