Research Question / Hypothesis

I hypothesize that there is a relationship between poverty status and mental health. I will test this hypothesis by analyzing survey responses to poverty status and mental health, and seeing whether or not mental health is dependent on poverty status.

Package Loading, Data Import, Data Prep

library(readr)
library(dplyr)
## 
## Attaching package: 'dplyr'
## The following objects are masked from 'package:stats':
## 
##     filter, lag
## The following objects are masked from 'package:base':
## 
##     intersect, setdiff, setequal, union
library(ggplot2)
data <- read.csv("/Users/Nazija/Downloads/SD2 Data(1).csv")

data <- data%>%
  select(poverty_status, mental_health)
unique(data$poverty_status)
## [1] above poverty below poverty
## Levels: above poverty below poverty
unique(data$mental_health)
## [1] Low Risk                 Moderate Mental Distress Serious Mental Illness  
## Levels: Low Risk Moderate Mental Distress Serious Mental Illness
head(data)
##   poverty_status            mental_health
## 1  above poverty                 Low Risk
## 2  above poverty                 Low Risk
## 3  above poverty                 Low Risk
## 4  above poverty Moderate Mental Distress
## 5  above poverty                 Low Risk
## 6  above poverty Moderate Mental Distress

Data Summary

[Poverty_Status Response Distribution]

table(data$poverty_status)%>%
  prop.table()%>%
  round(2)
## 
## above poverty below poverty 
##          0.83          0.17

[Mental_Health Response Distribution]

table(data$mental_health)%>%
  prop.table()%>%
  round(2)
## 
##                 Low Risk Moderate Mental Distress   Serious Mental Illness 
##                     0.80                     0.16                     0.03

[Expected Crosstab Distribution]

above_lr = .83 * .8
above_moderate = .83 * .16
above_serious = .83 * .03

below_lr = .17 * .8
below_moderate = .17*.16
below_serious = .17*.03

print("Expected Values Above Poverty (low risk, moderate, serious)")
## [1] "Expected Values Above Poverty (low risk, moderate, serious)"
above_lr
## [1] 0.664
above_moderate
## [1] 0.1328
above_serious
## [1] 0.0249
print("Expected Values Below Poverty (low risk, moderate, serious")
## [1] "Expected Values Below Poverty (low risk, moderate, serious"
below_lr
## [1] 0.136
below_moderate
## [1] 0.0272
below_serious
## [1] 0.0051
chisq.test(data$poverty_status, data$mental_health)[7]
## $expected
##                    data$mental_health
## data$poverty_status  Low Risk Moderate Mental Distress Serious Mental Illness
##       above poverty 171875.35                34582.577               7155.074
##       below poverty  34036.65                 6848.423               1416.926

[Observed Crosstab Distribution]

table(data$poverty_status, data$mental_health)%>%
  prop.table()%>%
  round(2)
##                
##                 Low Risk Moderate Mental Distress Serious Mental Illness
##   above poverty     0.69                     0.12                   0.02
##   below poverty     0.11                     0.04                   0.01
chisq.test(data$poverty_status, data$mental_health)[6]
## $observed
##                    data$mental_health
## data$poverty_status Low Risk Moderate Mental Distress Serious Mental Illness
##       above poverty   177273                    31235                   5105
##       below poverty    28639                    10196                   3467

The observed values are different from the expected values, and show that respondents above poverty were at lower risk or had less mental distress than expected, while those below poverty were experiecing more mental distress than expected.

Data Analysis

Relationship of Interest: Crosstab showing poverty_status percentages

table(data$poverty_status, data$mental_health)%>%
  prop.table(1)%>%
  round(2)
##                
##                 Low Risk Moderate Mental Distress Serious Mental Illness
##   above poverty     0.83                     0.15                   0.02
##   below poverty     0.68                     0.24                   0.08

Relationship of Interest: [Visualization]

data%>%
  group_by(poverty_status, mental_health)%>%
  summarize(n = n())%>%
  mutate(percentage = n/sum(n))%>%
  ggplot()+
  geom_col(aes(x = poverty_status, y = percentage, fill = mental_health))
## `summarise()` regrouping output by 'poverty_status' (override with `.groups` argument)

The visualization shows how a greater percentage of people below poverty are experience moderate or serious mental distress compared to people above poverty, who have a greater percentage at low risk of mental health issues.

Chi-Squared Test

chisq.test(data$poverty_status, data$mental_health)
## 
##  Pearson's Chi-squared test
## 
## data:  data$poverty_status and data$mental_health
## X-squared = 6539.4, df = 2, p-value < 2.2e-16

The results indicate that there is a statistically significant relationship between inidividuals’ poverty status and their mental health. Since the p-value is much lower than .05, we know the relationship is statistically significant.