After browsing the NHIS data, I suspect that there may be a relationship between the marriage status and the poverty line variables. I predict that married individuals are more likely to live above the poverty line, than the unmarried individuals. Confirming/rejecting this information may have implications on the traditional benefits of marriage in a modern society.
library(ggplot2)
library(readr)
library(dplyr)
##
## Attaching package: 'dplyr'
## The following objects are masked from 'package:stats':
##
## filter, lag
## The following objects are masked from 'package:base':
##
## intersect, setdiff, setequal, union
NHISDATA<-read_csv('/Volumes/FLASHDRIVE/Data 333/NHIS Data.csv')
##
## ── Column specification ────────────────────────────────────────────────────────
## cols(
## .default = col_double(),
## Demo_Race = col_logical(),
## Demo_Hispanic = col_character(),
## Demo_RaceEthnicity = col_character(),
## Demo_Region = col_character(),
## Demo_sex_C = col_character(),
## Demo_sexorien_C = col_logical(),
## Demo_agerange_C = col_character(),
## Demo_marital_C = col_character(),
## Demo_hourswrk_C = col_character(),
## MentalHealth_MentalIllnessK6_C = col_character(),
## MentalHealth_depressionmeds_B = col_logical(),
## Health_SelfRatedHealth_C = col_character(),
## Health_diagnosed_STD5yr_B = col_logical(),
## Health_BirthControlNow_B = col_logical(),
## Health_EverHavePrediabetes_B = col_logical(),
## Health_HIVAidsRisk_C = col_character(),
## Health_BMI_C = col_character(),
## Health_UsualPlaceHealthcare_C = col_character(),
## Health_AbnormalPapPast3yr_B = col_logical(),
## Behav_CigsPerDay_C = col_character()
## # ... with 1 more columns
## )
## ℹ Use `spec()` for the full column specifications.
## Warning: 683386 parsing failures.
## row col expected actual file
## 68557 Demo_Race 1/0/T/F/TRUE/FALSE Black or African American '/Volumes/FLASHDRIVE/Data 333/NHIS Data.csv'
## 68558 Demo_Race 1/0/T/F/TRUE/FALSE Asian '/Volumes/FLASHDRIVE/Data 333/NHIS Data.csv'
## 68559 Demo_Race 1/0/T/F/TRUE/FALSE American Indian or Alaskan Native '/Volumes/FLASHDRIVE/Data 333/NHIS Data.csv'
## 68560 Demo_Race 1/0/T/F/TRUE/FALSE White '/Volumes/FLASHDRIVE/Data 333/NHIS Data.csv'
## 68561 Demo_Race 1/0/T/F/TRUE/FALSE White '/Volumes/FLASHDRIVE/Data 333/NHIS Data.csv'
## ..... ......... .................. ................................. ............................................
## See problems(...) for more details.
Null: There is no relationship between the two variables.
chisq.test(NHISDATA$Demo_marital_C,NHISDATA$Demo_belowpovertyline_B)[7]
## $expected
##
## NHISDATA$Demo_marital_C 0 1
## DivorcedOrSeparated 81532.34 15948.663
## Married 202362.56 39584.443
## Never Married 113204.00 22144.003
## Widowed 39542.11 7734.891
Alternative: There is a relationship between the two variables.
chisq.test(NHISDATA$Demo_marital_C,NHISDATA$Demo_belowpovertyline_B)[6]
## $observed
##
## NHISDATA$Demo_marital_C 0 1
## DivorcedOrSeparated 76518 20963
## Married 222564 19383
## Never Married 99447 35901
## Widowed 38112 9165
Interpretation: The differences between the null and actual values appear to be meaningful within all row/column combinations. When the percent difference is calculated between the values given in the null and the actual tables, surprising numbers are found. For example, there is a 69% difference between the null and actual values given for the married individuals living below the poverty line.
table_row<-table(NHISDATA$Demo_marital_C,NHISDATA$Demo_belowpovertyline_B)%>%
prop.table(1)
table_row
##
## 0 1
## DivorcedOrSeparated 0.78495297 0.21504703
## Married 0.91988741 0.08011259
## Never Married 0.73475042 0.26524958
## Widowed 0.80614252 0.19385748
NHIS<-NHISDATA%>%
select(MaritalStatus=Demo_marital_C,PovertyLine=Demo_belowpovertyline_B) %>%
na.omit((NHIS))
NHIS%>%
group_by(MaritalStatus,PovertyLine)%>%
summarize(n=n()) %>%
mutate(Percent=n/sum(n)) %>%
ggplot()+
geom_col(aes(x=MaritalStatus, y=Percent, fill=PovertyLine)) +
geom_text(aes(x = MaritalStatus, y=Percent, label=Percent),
colour = "cyan", position=position_stack(vjust=0.5)) +
ggtitle("Poverty Level by Marital Status")
## `summarise()` has grouped output by 'MaritalStatus'. You can override using the `.groups` argument.
Interpretation: The crosstab and column chart results agree with my hypothesis. 91% of married couples live above the poverty line, compared to the 8 percent that live below it. Never married individuals have the lowest amount of individuals living above the poverty line (73%) when compared to other marital statuses.
chisq.test(NHISDATA$Demo_marital_C,NHISDATA$Demo_belowpovertyline_B)
##
## Pearson's Chi-squared test
##
## data: NHISDATA$Demo_marital_C and NHISDATA$Demo_belowpovertyline_B
## X-squared = 24746, df = 3, p-value < 2.2e-16
Interpretation: The p-value is less than 0.05, it is significant, and therefore the null hypothesis is rejected. In this dataset, a correlation between marital status and poverty line has been found.