Research Question / Hypothesis

Hypothesis: I hypothesize that there is a relationship between mental_health and smoking_history. The relation is, smoking 100 or more cigarettes in the lifetime has an impact on mental health.

Package Loading, Data Import, Data Prep

library(readr)
library(dplyr)
library(ggplot2)
library(readr)

Health = read_csv("/Users/rameasaarna/Desktop/skill drill 2/SD2 Data(1).csv")
head(Health)
## # A tibble: 6 x 16
##   sex   race  marital_status poverty_status age_range health bmi_category
##   <chr> <chr> <chr>          <chr>          <chr>     <chr>  <chr>       
## 1 fema… White Never Married  above poverty  18-29     Excel… Normal      
## 2 fema… White Married        above poverty  50-59     Very … Normal      
## 3 male  White DivorcedOrSep… above poverty  60-69     Good   Normal      
## 4 male  White DivorcedOrSep… above poverty  50-59     Fair   Obese       
## 5 fema… White Married        above poverty  30-39     Very … Normal      
## 6 male  White Never Married  above poverty  18-29     Excel… Normal      
## # … with 9 more variables: mental_health <chr>, heart_attack_history <chr>,
## #   heart_condition_history <chr>, cancer_history <chr>,
## #   prediabetes_history <chr>, asthma_history <chr>,
## #   hypertension_history <chr>, smoking_history <chr>,
## #   birthcontrol_status <lgl>
MentalHealth = Health%>%
  select(mental_health, smoking_history  )

MentalHealth
## # A tibble: 255,915 x 2
##    mental_health            smoking_history
##    <chr>                    <chr>          
##  1 Low Risk                 Yes            
##  2 Low Risk                 No             
##  3 Low Risk                 Yes            
##  4 Moderate Mental Distress Yes            
##  5 Low Risk                 Yes            
##  6 Moderate Mental Distress No             
##  7 Low Risk                 No             
##  8 Low Risk                 No             
##  9 Low Risk                 Yes            
## 10 Low Risk                 No             
## # … with 255,905 more rows

Data Summary

Distribution of smoking_history

table(MentalHealth$smoking_history)%>%  # independent variable
  prop.table()%>%
round(2)
## 
##  No Yes 
## 0.6 0.4

Distribution of mental_health

table(MentalHealth$mental_health)%>%   #dependent variable
  prop.table()%>%
round(2)
## 
##                 Low Risk Moderate Mental Distress   Serious Mental Illness 
##                     0.80                     0.16                     0.03

The table is showing the quantity of responses for each category combination of expected observation.

chisq.test(MentalHealth$mental_health, MentalHealth$smoking_history)[7]
## $expected
##                           MentalHealth$smoking_history
## MentalHealth$mental_health         No       Yes
##   Low Risk                 122904.316 83007.684
##   Moderate Mental Distress  24729.247 16701.753
##   Serious Mental Illness     5116.437  3455.563

The table is showing the quantity of responses for each category combination of observed observation.

chisq.test(MentalHealth$mental_health, MentalHealth$smoking_history)[6]
## $observed
##                           MentalHealth$smoking_history
## MentalHealth$mental_health     No    Yes
##   Low Risk                 128367  77545
##   Moderate Mental Distress  21053  20378
##   Serious Mental Illness     3330   5242

The expected observation is different than the observed expectation.

Data Analysis

Relationship of Interest: Crosstab showing Column%

table(MentalHealth$mental_health, MentalHealth$smoking_history)%>%
  prop.table(2)
##                           
##                                    No        Yes
##   Low Risk                 0.84037316 0.75165996
##   Moderate Mental Distress 0.13782651 0.19752823
##   Serious Mental Illness   0.02180033 0.05081181

Relationship of Interest: [Visualization]

MentalHealth%>%
  group_by(smoking_history, mental_health)%>%
  summarize(n=n()) %>%
   mutate(percent=n/sum(n)) %>%
   ggplot()+
   geom_col(aes(x= smoking_history, y=percent, fill=mental_health)) # independent goes in x, outcome/dependent goes in fill

From the analysis , it is clear that, people who did not smoke 100 or more cigarettes in their lifetime their mental heath is in low risk than those people who smoke cigarettes. Moreover, people who did smoke 100 or more cigarettes in their lifetime they suffer in Serious Mental Illness and Moderate Mental Distress than those people who did not smoke cigarettes.

Chi-Squared Test

options(scipen = 999)
chisq.test(MentalHealth$smoking_history, MentalHealth$mental_health)
## 
##  Pearson's Chi-squared test
## 
## data:  MentalHealth$smoking_history and MentalHealth$mental_health
## X-squared = 3505.3, df = 2, p-value < 0.00000000000000022

The results below indicate that, there is a statistically significant relationship between smoking_history and mental_heath and people who smokes they suffer Mental Problem than the people who did not smoke.