Setup

Assignment description

This is the capstone project by Kristen Phan for Duke University’s Inferential Statistics course (Course URL).

The project consists of 4 parts:
1. Data: Summarize background information on the dataset including its generalizability and causality
2. Research question: Formulate a research question pertaining to the dataset
3. Exploratory Data Analysis: Perform EDA on the dataset
4. Inference: Perform statistical inference (confidence interval and hypothesis testing) on the dataset to address the research question set forth

Load data


Part 1: Data

1. Background:
[Excerpted from the GSS project description]
Since 1972, the General Social Survey (GSS) has been monitoring societal change and studying the growing complexity of American society. The GSS aims to gather data on contemporary American society in order to monitor and explain trends and constants in attitudes, behaviors, and attributes; to examine the structure and functioning of society in general as well as the role played by relevant subgroups; to compare the United States to other societies in order to place American society in comparative perspective and develop cross-national models of human society; and to make high-quality data easily accessible to scholars, students, policy makers, and others, with minimal cost and waiting.
GSS questions cover a diverse range of issues including national spending priorities, marijuana use, crime and punishment, race relations, quality of life, confidence in institutions, and sexual behavior.More information on the dataset here.

2. Generalizability:
According to Appendix A Sampling Design and Weighting, the study deploys multi-stage, random cluster sampling. However, the most recent response rate documented is 61.3% in 2016 (source). Therefore, the generalizability of the study is limited due to possible non-response sampling bias.

3. Causality:
Because this is an observational study, there was no random assignment. As the result, we can only make inference about association, not causality, from the dataset.


Part 2: Research question

Is there a correlation betweenreligous engagement and financial satisfaction for respondents who identified themselves as middle class?
Religous engagement is measured by how often the respondents attend religous services (low-med-high engagement). Financial satisfaction is measured by the proportion of respondents who claimed to be satisfied with their present satisfaction.


Part 3: Exploratory data analysis

First, we filter out data pertaining to religious engagement (‘attend’), financial satisfaction (‘satfin’), social classes (‘class’).

## 'data.frame':    19326 obs. of  2 variables:
##  $ attend : Factor w/ 9 levels "Never","Lt Once A Year",..: 3 8 3 8 9 8 8 6 6 4 ...
##  $ sat_fin: Factor w/ 3 levels "Satisfied","More Or Less",..: 3 2 2 2 1 1 2 1 1 1 ...
##            Never   Lt Once A Year      Once A Year Sevrl Times A Yr 
##                0             1586             2876             2872 
##     Once A Month     2-3X A Month  Nrly Every Week       Every Week 
##             1655             2006             1362             5224 
## More Thn Once Wk 
##             1745

Next, we create a new variable called Religious Engagement (‘relieng’) to categorize respondents based on how often they attend religous services:
1. Low engagement: never, lt once a year, or once a year attend religous services
2. Medium enagement: several times a year, once a month, or 2-3x a month
3. High engagement: nearly every week, every week, more than once a week

## # A tibble: 3 x 3
##   reli_eng count percentage
##   <chr>    <int>      <dbl>
## 1 high     11203         58
## 2 low       4462         23
## 3 med       3661         19

As shown in the table, the majority of middle-class respondents highly engage in religous services.
In the dataset, respondents were asked if they were “Satisfied”, “More Or Less”, or “Not At All Satisfied” with their current financial situation. In this analysis, we will simplify the reponses by marking “More Or Less” and “Not At All Satisfied” as “Not Satisfied”.

## # A tibble: 2 x 2
##   sat         count
##   <chr>       <int>
## 1 satisfied    7614
## 2 unsatisfied 11712

Now we begin the visualize the distribution of middle-class repondents who are satisfied with their current financial situation across different levels of religous engagement.

To validate whether there is a correlation between religious engagement and financial satisfaction, we are moving on to performing statistical inference.


Part 4: Inference


Hypotheses

Null hypothesis Ho: Religous engagement and financial satisfaction are independent

Alternative hypothesis Ha: Religous engagement and financial satisfaction are dependent

Inferential method

In our inferential analysis, we aim to answer whether the respondents’ religous engagement and financial statisfaction are indepdent using chi-square independence test. Religous engagement is measured as low-med-high based on how often the respondents attend religous services. Financial satisfaction is calculated as the proportion of respondents who are satisfied with their current financial situation.

Because the respondents’ social classes might impose confouding effects on their financial satisfaction, the analysis is limited to the respondents who identified themselves as middle class.
The underlying rationale behind this inference is that some people might find comfort and encouragement from their religion, which then affects their perspective and life satisfaction, including financial satisfaction. It is important to emphasize that this study does not infer causation, only correlation. In other words, in no way the results of this study infer that high religous engagement leads to (or causes) high financial satisfaction.

Last but not least, we will set alpha = 5% as our significance level.

Check conditions

  1. Independence:
    It is safe to assume that all observations are independent of each other.

  2. Sample size:
    For each level of religious engagement, we all have more than 5 expected cases.

## # A tibble: 3 x 6
##   reli_eng n_respdents obs_sat expected_sat obs_unsat expected_unsat
##   <chr>          <int>   <int>        <dbl>     <int>          <dbl>
## 1 high           11203    4670         4414      6533           6789
## 2 low             4462    1562         1758      2900           2704
## 3 med             3661    1382         1442      2279           2219

Since both conditions are met, we now move on to performing the chi-square test.

Perform inference

Now we calculate chi-square statistic, degree of freedom (df), and p-value

## [1] 64.67866
## [1] 2
## [1] 9.020032e-15

Interpret results

Because pval < alpha (5%), we reject the null hypothesis. The data provides convincing evidence that religous engagement and financial satisfaction are correlated.