Hypothesis: I hypothesize that there is a relationship between mental_health and smoking_history. The relation is, smoking 100 or more cigarettes in the lifetime has an impact on mental health.
library(readr)
library(dplyr)
library(ggplot2)
library(readr)
Health = read_csv("/Users/rameasaarna/Desktop/skill drill 2/SD2 Data(1).csv")
head(Health)
## # A tibble: 6 x 16
## sex race marital_status poverty_status age_range health bmi_category
## <chr> <chr> <chr> <chr> <chr> <chr> <chr>
## 1 fema… White Never Married above poverty 18-29 Excel… Normal
## 2 fema… White Married above poverty 50-59 Very … Normal
## 3 male White DivorcedOrSep… above poverty 60-69 Good Normal
## 4 male White DivorcedOrSep… above poverty 50-59 Fair Obese
## 5 fema… White Married above poverty 30-39 Very … Normal
## 6 male White Never Married above poverty 18-29 Excel… Normal
## # … with 9 more variables: mental_health <chr>, heart_attack_history <chr>,
## # heart_condition_history <chr>, cancer_history <chr>,
## # prediabetes_history <chr>, asthma_history <chr>,
## # hypertension_history <chr>, smoking_history <chr>,
## # birthcontrol_status <lgl>
MentalHealth = Health%>%
select(mental_health, smoking_history )
MentalHealth
## # A tibble: 255,915 x 2
## mental_health smoking_history
## <chr> <chr>
## 1 Low Risk Yes
## 2 Low Risk No
## 3 Low Risk Yes
## 4 Moderate Mental Distress Yes
## 5 Low Risk Yes
## 6 Moderate Mental Distress No
## 7 Low Risk No
## 8 Low Risk No
## 9 Low Risk Yes
## 10 Low Risk No
## # … with 255,905 more rows
table(MentalHealth$smoking_history)%>% # independent variable
prop.table()%>%
round(2)
##
## No Yes
## 0.6 0.4
table(MentalHealth$mental_health)%>% #dependent variable
prop.table()%>%
round(2)
##
## Low Risk Moderate Mental Distress Serious Mental Illness
## 0.80 0.16 0.03
chisq.test(MentalHealth$mental_health, MentalHealth$smoking_history)[7]
## $expected
## MentalHealth$smoking_history
## MentalHealth$mental_health No Yes
## Low Risk 122904.316 83007.684
## Moderate Mental Distress 24729.247 16701.753
## Serious Mental Illness 5116.437 3455.563
chisq.test(MentalHealth$mental_health, MentalHealth$smoking_history)[6]
## $observed
## MentalHealth$smoking_history
## MentalHealth$mental_health No Yes
## Low Risk 128367 77545
## Moderate Mental Distress 21053 20378
## Serious Mental Illness 3330 5242
The expected observation is different than the observed expectation.
table(MentalHealth$mental_health, MentalHealth$smoking_history)%>%
prop.table(2)
##
## No Yes
## Low Risk 0.84037316 0.75165996
## Moderate Mental Distress 0.13782651 0.19752823
## Serious Mental Illness 0.02180033 0.05081181
MentalHealth%>%
group_by(smoking_history, mental_health)%>%
summarize(n=n()) %>%
mutate(percent=n/sum(n)) %>%
ggplot()+
geom_col(aes(x= smoking_history, y=percent, fill=mental_health)) # independent goes in x, outcome/dependent goes in fill
From the analysis , it is clear that, people who did not smoke 100 or more cigarettes in their lifetime their mental heath is in low risk than those people who smoke cigarettes. Moreover, people who did smoke 100 or more cigarettes in their lifetime they suffer in Serious Mental Illness and Moderate Mental Distress than those people who did not smoke cigarettes.
options(scipen = 999)
chisq.test(MentalHealth$smoking_history, MentalHealth$mental_health)
##
## Pearson's Chi-squared test
##
## data: MentalHealth$smoking_history and MentalHealth$mental_health
## X-squared = 3505.3, df = 2, p-value < 0.00000000000000022
The results below indicate that, there is a statistically significant relationship between smoking_history and mental_heath and people who smokes they suffer Mental Problem than the people who did not smoke.