SD2:Chi-Squared Test for Independence

Research Question / Hypothesis

I hypothesize that there is a relationship between poverty status and mental health. I will test this hypothesis by analyzing survey responses to poverty status and mental health, and seeing whether or not mental health is dependent on poverty status.

Package Loading, Data Import, Data Prep

library(readr)
library(dplyr)

## 
## Attaching package: 'dplyr'

## The following objects are masked from 'package:stats':
## 
##     filter, lag

## The following objects are masked from 'package:base':
## 
##     intersect, setdiff, setequal, union

library(ggplot2)
data <- read.csv("/Users/Nazija/Downloads/SD2 Data(1).csv")

data <- data%>%
  select(poverty_status, mental_health)
unique(data$poverty_status)

## [1] above poverty below poverty
## Levels: above poverty below poverty

unique(data$mental_health)

## [1] Low Risk                 Moderate Mental Distress Serious Mental Illness  
## Levels: Low Risk Moderate Mental Distress Serious Mental Illness

head(data)

##   poverty_status            mental_health
## 1  above poverty                 Low Risk
## 2  above poverty                 Low Risk
## 3  above poverty                 Low Risk
## 4  above poverty Moderate Mental Distress
## 5  above poverty                 Low Risk
## 6  above poverty Moderate Mental Distress

Data Summary

[Poverty_Status Response Distribution]

table(data$poverty_status)%>%
  prop.table()%>%
  round(2)

## 
## above poverty below poverty 
##          0.83          0.17

[Mental_Health Response Distribution]

table(data$mental_health)%>%
  prop.table()%>%
  round(2)

## 
##                 Low Risk Moderate Mental Distress   Serious Mental Illness 
##                     0.80                     0.16                     0.03

[Expected Crosstab Distribution]

above_lr = .83 * .8
above_moderate = .83 * .16
above_serious = .83 * .03

below_lr = .17 * .8
below_moderate = .17*.16
below_serious = .17*.03

print("Expected Values Above Poverty (low risk, moderate, serious)")

## [1] "Expected Values Above Poverty (low risk, moderate, serious)"

above_lr

## [1] 0.664

above_moderate

## [1] 0.1328

above_serious

## [1] 0.0249

print("Expected Values Below Poverty (low risk, moderate, serious")

## [1] "Expected Values Below Poverty (low risk, moderate, serious"

below_lr

## [1] 0.136

below_moderate

## [1] 0.0272

below_serious

## [1] 0.0051

chisq.test(data$poverty_status, data$mental_health)[7]

## $expected
##                    data$mental_health
## data$poverty_status  Low Risk Moderate Mental Distress Serious Mental Illness
##       above poverty 171875.35                34582.577               7155.074
##       below poverty  34036.65                 6848.423               1416.926

[Observed Crosstab Distribution]

table(data$poverty_status, data$mental_health)%>%
  prop.table()%>%
  round(2)

##                
##                 Low Risk Moderate Mental Distress Serious Mental Illness
##   above poverty     0.69                     0.12                   0.02
##   below poverty     0.11                     0.04                   0.01

chisq.test(data$poverty_status, data$mental_health)[6]

## $observed
##                    data$mental_health
## data$poverty_status Low Risk Moderate Mental Distress Serious Mental Illness
##       above poverty   177273                    31235                   5105
##       below poverty    28639                    10196                   3467

The observed values are different from the expected values, and show that respondents above poverty were at lower risk or had less mental distress than expected, while those below poverty were experiecing more mental distress than expected.

Data Analysis

Relationship of Interest: Crosstab showing poverty_status percentages

table(data$poverty_status, data$mental_health)%>%
  prop.table(1)%>%
  round(2)

##                
##                 Low Risk Moderate Mental Distress Serious Mental Illness
##   above poverty     0.83                     0.15                   0.02
##   below poverty     0.68                     0.24                   0.08

Relationship of Interest: [Visualization]

data%>%
  group_by(poverty_status, mental_health)%>%
  summarize(n = n())%>%
  mutate(percentage = n/sum(n))%>%
  ggplot()+
  geom_col(aes(x = poverty_status, y = percentage, fill = mental_health))

## `summarise()` regrouping output by 'poverty_status' (override with `.groups` argument)

The visualization shows how a greater percentage of people below poverty are experience moderate or serious mental distress compared to people above poverty, who have a greater percentage at low risk of mental health issues.

Chi-Squared Test

chisq.test(data$poverty_status, data$mental_health)

## 
##  Pearson's Chi-squared test
## 
## data:  data$poverty_status and data$mental_health
## X-squared = 6539.4, df = 2, p-value < 2.2e-16

The results indicate that there is a statistically significant relationship between inidividuals’ poverty status and their mental health. Since the p-value is much lower than .05, we know the relationship is statistically significant.