Hypothesis

After browsing the NHIS data, I suspect that there may be a relationship between the marriage status and the poverty line variables. I predict that married individuals are more likely to live above the poverty line, than the unmarried individuals. Confirming/rejecting this information may have implications on the traditional benefits of marriage in a modern society.

Prep

library(ggplot2)
library(readr)
library(dplyr)
## 
## Attaching package: 'dplyr'
## The following objects are masked from 'package:stats':
## 
##     filter, lag
## The following objects are masked from 'package:base':
## 
##     intersect, setdiff, setequal, union
NHISDATA<-read_csv('/Volumes/FLASHDRIVE/Data 333/NHIS Data.csv')
## 
## ── Column specification ────────────────────────────────────────────────────────
## cols(
##   .default = col_double(),
##   Demo_Race = col_logical(),
##   Demo_Hispanic = col_character(),
##   Demo_RaceEthnicity = col_character(),
##   Demo_Region = col_character(),
##   Demo_sex_C = col_character(),
##   Demo_sexorien_C = col_logical(),
##   Demo_agerange_C = col_character(),
##   Demo_marital_C = col_character(),
##   Demo_hourswrk_C = col_character(),
##   MentalHealth_MentalIllnessK6_C = col_character(),
##   MentalHealth_depressionmeds_B = col_logical(),
##   Health_SelfRatedHealth_C = col_character(),
##   Health_diagnosed_STD5yr_B = col_logical(),
##   Health_BirthControlNow_B = col_logical(),
##   Health_EverHavePrediabetes_B = col_logical(),
##   Health_HIVAidsRisk_C = col_character(),
##   Health_BMI_C = col_character(),
##   Health_UsualPlaceHealthcare_C = col_character(),
##   Health_AbnormalPapPast3yr_B = col_logical(),
##   Behav_CigsPerDay_C = col_character()
##   # ... with 1 more columns
## )
## ℹ Use `spec()` for the full column specifications.
## Warning: 683386 parsing failures.
##   row       col           expected                            actual                                         file
## 68557 Demo_Race 1/0/T/F/TRUE/FALSE Black or African American         '/Volumes/FLASHDRIVE/Data 333/NHIS Data.csv'
## 68558 Demo_Race 1/0/T/F/TRUE/FALSE Asian                             '/Volumes/FLASHDRIVE/Data 333/NHIS Data.csv'
## 68559 Demo_Race 1/0/T/F/TRUE/FALSE American Indian or Alaskan Native '/Volumes/FLASHDRIVE/Data 333/NHIS Data.csv'
## 68560 Demo_Race 1/0/T/F/TRUE/FALSE White                             '/Volumes/FLASHDRIVE/Data 333/NHIS Data.csv'
## 68561 Demo_Race 1/0/T/F/TRUE/FALSE White                             '/Volumes/FLASHDRIVE/Data 333/NHIS Data.csv'
## ..... ......... .................. ................................. ............................................
## See problems(...) for more details.

Chi Square Null

Null: There is no relationship between the two variables.

chisq.test(NHISDATA$Demo_marital_C,NHISDATA$Demo_belowpovertyline_B)[7]
## $expected
##                        
## NHISDATA$Demo_marital_C         0         1
##     DivorcedOrSeparated  81532.34 15948.663
##     Married             202362.56 39584.443
##     Never Married       113204.00 22144.003
##     Widowed              39542.11  7734.891

Chi Square Actual

Alternative: There is a relationship between the two variables.

chisq.test(NHISDATA$Demo_marital_C,NHISDATA$Demo_belowpovertyline_B)[6]
## $observed
##                        
## NHISDATA$Demo_marital_C      0      1
##     DivorcedOrSeparated  76518  20963
##     Married             222564  19383
##     Never Married        99447  35901
##     Widowed              38112   9165

Interpretation: The differences between the null and actual values appear to be meaningful within all row/column combinations. When the percent difference is calculated between the values given in the null and the actual tables, surprising numbers are found. For example, there is a 69% difference between the null and actual values given for the married individuals living below the poverty line.

Row %

table_row<-table(NHISDATA$Demo_marital_C,NHISDATA$Demo_belowpovertyline_B)%>%
prop.table(1)
table_row
##                      
##                                0          1
##   DivorcedOrSeparated 0.78495297 0.21504703
##   Married             0.91988741 0.08011259
##   Never Married       0.73475042 0.26524958
##   Widowed             0.80614252 0.19385748

Visualization

NHIS<-NHISDATA%>%
  select(MaritalStatus=Demo_marital_C,PovertyLine=Demo_belowpovertyline_B) %>%
  na.omit((NHIS))

NHIS%>%
group_by(MaritalStatus,PovertyLine)%>% 
summarize(n=n()) %>%
mutate(Percent=n/sum(n)) %>%
ggplot()+
geom_col(aes(x=MaritalStatus, y=Percent, fill=PovertyLine)) +
geom_text(aes(x = MaritalStatus, y=Percent, label=Percent),
          colour = "cyan", position=position_stack(vjust=0.5)) +
ggtitle("Poverty Level by Marital Status")
## `summarise()` has grouped output by 'MaritalStatus'. You can override using the `.groups` argument.

Interpretation: The crosstab and column chart results agree with my hypothesis. 91% of married couples live above the poverty line, compared to the 8 percent that live below it. Never married individuals have the lowest amount of individuals living above the poverty line (73%) when compared to other marital statuses.

Chi-Square Stat Test

chisq.test(NHISDATA$Demo_marital_C,NHISDATA$Demo_belowpovertyline_B)
## 
##  Pearson's Chi-squared test
## 
## data:  NHISDATA$Demo_marital_C and NHISDATA$Demo_belowpovertyline_B
## X-squared = 24746, df = 3, p-value < 2.2e-16

Interpretation: The p-value is less than 0.05, it is significant, and therefore the null hypothesis is rejected. In this dataset, a correlation between marital status and poverty line has been found.