I hypothesize that there is a feeling about mental_health will not differ between smoking_history.
library(readr)
library(dplyr)
##
## Attaching package: 'dplyr'
## The following objects are masked from 'package:stats':
##
## filter, lag
## The following objects are masked from 'package:base':
##
## intersect, setdiff, setequal, union
library(ggplot2)
data <- read_csv("C:/Users/jammi/Downloads/SD2 Data(1).csv")
## Parsed with column specification:
## cols(
## sex = col_character(),
## race = col_character(),
## marital_status = col_character(),
## poverty_status = col_character(),
## age_range = col_character(),
## health = col_character(),
## bmi_category = col_character(),
## mental_health = col_character(),
## heart_attack_history = col_character(),
## heart_condition_history = col_character(),
## cancer_history = col_character(),
## prediabetes_history = col_character(),
## asthma_history = col_character(),
## hypertension_history = col_character(),
## smoking_history = col_character(),
## birthcontrol_status = col_logical()
## )
head(data)
## # A tibble: 6 x 16
## sex race marital_status poverty_status age_range health bmi_category
## <chr> <chr> <chr> <chr> <chr> <chr> <chr>
## 1 fema~ White Never Married above poverty 18-29 Excel~ Normal
## 2 fema~ White Married above poverty 50-59 Very ~ Normal
## 3 male White DivorcedOrSep~ above poverty 60-69 Good Normal
## 4 male White DivorcedOrSep~ above poverty 50-59 Fair Obese
## 5 fema~ White Married above poverty 30-39 Very ~ Normal
## 6 male White Never Married above poverty 18-29 Excel~ Normal
## # ... with 9 more variables: mental_health <chr>, heart_attack_history <chr>,
## # heart_condition_history <chr>, cancer_history <chr>,
## # prediabetes_history <chr>, asthma_history <chr>,
## # hypertension_history <chr>, smoking_history <chr>,
## # birthcontrol_status <lgl>
table(data$smoking_history)%>%
prop.table()%>%
round(2)
##
## No Yes
## 0.6 0.4
table(data$mental_health)%>%
prop.table()%>%
round(2)
##
## Low Risk Moderate Mental Distress Serious Mental Illness
## 0.80 0.16 0.03
table(data$smoking_history,data$mental_health)%>%
prop.table()
##
## Low Risk Moderate Mental Distress Serious Mental Illness
## No 0.50160014 0.08226560 0.01301213
## Yes 0.30301077 0.07962800 0.02048336
table(data$smoking_history,data$mental_health)%>%
prop.table()%>%
round(2)
##
## Low Risk Moderate Mental Distress Serious Mental Illness
## No 0.50 0.08 0.01
## Yes 0.30 0.08 0.02
About 30% says that mental health is affect by smoking and about 50% says that mental health is affected by smoking. about 10% said yes and no to moderate mental Distress and about 0% says no serious mental illness.The table below shows the actual % of responses for each category combination. A crosstab showing table %. These values are not very different from the expected observations from the null hypothesis.
data%>%
group_by(smoking_history,mental_health)%>%
summarize(n=n())%>%
mutate(perecent=n/sum(n))
## `summarise()` regrouping output by 'smoking_history' (override with `.groups` argument)
## # A tibble: 6 x 4
## # Groups: smoking_history [2]
## smoking_history mental_health n perecent
## <chr> <chr> <int> <dbl>
## 1 No Low Risk 128367 0.840
## 2 No Moderate Mental Distress 21053 0.138
## 3 No Serious Mental Illness 3330 0.0218
## 4 Yes Low Risk 77545 0.752
## 5 Yes Moderate Mental Distress 20378 0.198
## 6 Yes Serious Mental Illness 5242 0.0508
data%>%
group_by(smoking_history,mental_health)%>%
summarize(n=n())%>%
mutate(percent=n/sum(n))%>%
ggplot()+
geom_col(aes(x=smoking_history,y=percent, fill=mental_health))
## `summarise()` regrouping output by 'smoking_history' (override with `.groups` argument)
[Interpretive Writing]
chisq.test(data$smoking_history,data$mental_health)
##
## Pearson's Chi-squared test
##
## data: data$smoking_history and data$mental_health
## X-squared = 3505.3, df = 2, p-value < 2.2e-16
There is a statistically significant relationship between smoking_history and mental_health