Research Question -I hypothesize that there will be a relationship between ever being on medication for depression and whether or not someone is male or female. I hypothesize that those who identify as female will be more likely to have been on medication for depression.
Prep
library(dplyr)
##
## Attaching package: 'dplyr'
## The following objects are masked from 'package:stats':
##
## filter, lag
## The following objects are masked from 'package:base':
##
## intersect, setdiff, setequal, union
library(readr)
library(ggplot2)
Dataset<-read.csv('/Users/apple/Downloads/NHIS Data.csv')
head(Dataset)
## psu sampweight year year_strata Demo_Race Demo_Hispanic
## 1 2 4316 1997 1997.514 Hispanic
## 2 2 2845 1997 1997.510 Hispanic
## 3 2 3783 1997 1997.510 Hispanic
## 4 2 2466 1997 1997.510 Hispanic
## 5 2 3794 1997 1997.510 Hispanic
## 6 1 1793 1997 1997.515 Hispanic
## Demo_RaceEthnicity Demo_Region Demo_sex_C Demo_sexorien_C
## 1 Hispanic (Race Identity Unknown) West female
## 2 Hispanic (Race Identity Unknown) West female
## 3 Hispanic (Race Identity Unknown) West male
## 4 Hispanic (Race Identity Unknown) West male
## 5 Hispanic (Race Identity Unknown) West male
## 6 Hispanic (Race Identity Unknown) West female
## Demo_belowpovertyline_B Demo_age_N Demo_agerange_C Demo_marital_C
## 1 1 33 30-39 Married
## 2 0 52 50-59 Married
## 3 0 41 40-49 Married
## 4 0 67 60-69 Widowed
## 5 1 25 18-29 Married
## 6 0 61 60-69 Widowed
## Demo_hourswrk_C MentalHealth_MentalIllnessK6_N MentalHealth_MentalIllnessK6_C
## 1 20-39 0 Low Risk
## 2 None NA
## 3 40-59 0 Low Risk
## 4 1-19 0 Low Risk
## 5 40-59 0 Low Risk
## 6 None 11 MMD
## MentalHealth_SeriousMentalIllnessK6_B MentalHealth_depressionmeds_B
## 1 0 NA
## 2 NA NA
## 3 0 NA
## 4 0 NA
## 5 0 NA
## 6 0 NA
## Health_SelfRatedHealth_C Health_diagnosed_STD5yr_B Health_BirthControlNow_B
## 1 Excellent NA NA
## 2 Very Good NA NA
## 3 Excellent NA NA
## 4 Very Good NA NA
## 5 Good NA NA
## 6 Poor NA NA
## Health_EverHaveHeartAttack_B Health_EverHaveHeartCondition_B
## 1 0 0
## 2 0 0
## 3 0 0
## 4 1 0
## 5 0 0
## 6 0 0
## Health_EverHaveCancer_B Health_EverHaveDiabetes_B
## 1 0 0
## 2 0 0
## 3 0 0
## 4 0 1
## 5 0 0
## 6 0 1
## Health_EverHavePrediabetes_B Health_EverHaveAsthma_B Health_StillHaveAsthma_B
## 1 NA 0 0
## 2 NA 0 0
## 3 NA 0 0
## 4 NA 0 0
## 5 NA 0 0
## 6 NA 1 NA
## Health_HIVAidsRisk_C Health_HIVAidsHighRisk_B Health_EverTakeHIVTest_B
## 1 Low 0 1
## 2 None 0 1
## 3 None 0 1
## 4 None 0 1
## 5 None 0 0
## 6 None 0 NA
## Health_EverHaveHypertension_B Health_BMI_N Health_BMI_C
## 1 0 19.73 Normal
## 2 0 25.73 Overweight
## 3 0 36.48 Obese
## 4 1 24.19 Normal
## 5 0 24.80 Normal
## 6 1 27.25 Overweight
## Health_BMIOverweight_B Health_BMIObese_B Health_Weight_N Health_Height_N
## 1 0 0 115 64
## 2 1 0 150 64
## 3 1 1 233 67
## 4 0 0 141 64
## 5 0 0 140 63
## 6 1 0 135 59
## Health_UsualPlaceHealthcare_C Health_UsualPlaceHealthcare_B
## 1 No 0
## 2 Yes 1
## 3 Yes 1
## 4 Yes 1
## 5 No 0
## 6 Yes 1
## Health_AbnormalPapPast3yr_B Behav_EverSmokeCigs_B Behav_CigsPerDay_N
## 1 NA 0 0
## 2 NA 0 0
## 3 NA 1 5
## 4 NA 0 0
## 5 NA 0 0
## 6 NA 1 0
## Behav_CigsPerDay_C Behav_AgeStartSmoking Behav_AlcDaysPerYear_N
## 1 00 None NA NA
## 2 00 None NA NA
## 3 01-09 17 1
## 4 00 None NA 3
## 5 00 None NA 2
## 6 00 None 6 5
## Behav_AlcDaysPerWeek_N Behav_BingeDrinkDaysYear_N Behav_BingeDrinkDaysYear_C
## 1 NA NA
## 2 NA NA
## 3 0 0 0 Days
## 4 0 0 0 Days
## 5 0 0 0 Days
## 6 0 2 001-10 Days
Select Variable
Dataset<-Dataset%>%
select(Demo_sex_C, MentalHealth_depressionmeds_B)%>%
filter(Demo_sex_C %in% combine("male", "female"), MentalHealth_depressionmeds_B %in% combine("0","1") )
## Warning: `combine()` is deprecated as of dplyr 1.0.0.
## Please use `vctrs::vec_c()` instead.
## This warning is displayed once every 8 hours.
## Call `lifecycle::last_warnings()` to see where this warning was generated.
Crosstab(Null-Demo_sex_c)
table(Dataset$Demo_sex_C)%>%
prop.table()%>%
round(2)
##
## female male
## 0.55 0.45
Crosstab(Null-MentalHealth_depressionmeds_B)
table(Dataset$MentalHealth_depressionmeds_B)%>%
prop.table()%>%
round(2)
##
## 0 1
## 0.91 0.09
Crosstab(Actual)
table(Dataset$Demo_sex_C,Dataset$MentalHealth_depressionmeds_B)%>%
prop.table()
##
## 0 1
## female 0.48798856 0.06360519
## male 0.42137922 0.02702703
The difference between the null hypothesis and the actual values do not differ greatly.The number of males and females who either take or do not take medication for depression do not demonstrate a substantial difference between the null hypothesis and the actual observations
Relationship of Interest
table(Dataset$Demo_sex_C,Dataset$MentalHealth_depressionmeds_B)%>%
prop.table(1)
##
## 0 1
## female 0.88468834 0.11531166
## male 0.93972647 0.06027353
88% of women responded that they have not been on medication for depression, while 93% of men responded that they have not been medication for depression. This supports my hypothesis that women were more likely to take medication for depression.
Chi-Square Statistical Test
chisq.test(Dataset$Demo_sex_C, Dataset$MentalHealth_depressionmeds_B)
##
## Pearson's Chi-squared test with Yates' continuity correction
##
## data: Dataset$Demo_sex_C and Dataset$MentalHealth_depressionmeds_B
## X-squared = 876.88, df = 1, p-value < 2.2e-16
The p-value, which is 2.2e-16, is lower than 0.05 which means that there is a relationship of statisical significance between the variables and they are dependent on one another.