Setup

Load packages

library(ggplot2)
## Warning: package 'ggplot2' was built under R version 3.6.2
library(dplyr)
## Warning: package 'dplyr' was built under R version 3.6.2

Load data

load("brfss2013.RData")

Part 1: Data

Data In the data set brfss, Stratified sampling was used, so sampling is random. However as there is no any pattern of fixed no of observation for hours of sleep or exercise the participants were not randomly assigned. It is just a simple observation study so generalization is possible but no inference can be made.


Part 2: Research questions

Research quesion 1: Is there a correlation between not having poor physical health physhlth and mental health menthlth? Are there differences based on gender ‘sex’? This would be a preliminary data exploration to examine the question of whether there is an association between physical illness and mental illness.

Research quesion 2: Is there a correlation between hours of sleep sleptim1 and poor mental health menthlth? We look at this by employment status employ1. We could look for an association between hours slept and poor mental health. We’ll look for patterns based on employment status.

Research quesion 3: Is there a correlation between reported general health genhlth and hours of sleep sleptim1 based on gender sex? We can examine the question of whether 7 or 8 hours of sleep per night is better for you.


Part 3: Exploratory data analysis

Research quesion 1:

set1 <- select(brfss2013, menthlth, physhlth, sleptim1, X_bmi5, nummen, numwomen, employ1, sex) %>%
  filter(menthlth != "NA") %>% filter(physhlth != "NA") %>% filter(sleptim1 != "NA") %>% filter(X_bmi5 != "NA") %>% filter(nummen != "NA") %>% filter(numwomen != "NA") %>% filter(employ1 != "NA") %>% filter(sex != "NA") 
summary(set1) 
##     menthlth         physhlth         sleptim1          X_bmi5    
##  Min.   : 0.000   Min.   : 0.000   Min.   : 1.000   Min.   :   1  
##  1st Qu.: 0.000   1st Qu.: 0.000   1st Qu.: 6.000   1st Qu.:2375  
##  Median : 0.000   Median : 0.000   Median : 7.000   Median :2692  
##  Mean   : 3.105   Mean   : 4.506   Mean   : 7.096   Mean   :2790  
##  3rd Qu.: 2.000   3rd Qu.: 3.000   3rd Qu.: 8.000   3rd Qu.:3085  
##  Max.   :30.000   Max.   :30.000   Max.   :24.000   Max.   :9511  
##                                                                   
##      nummen          numwomen                                employ1      
##  1      :208965   1      :252661   Employed for wages            :118563  
##  0      : 87620   0      : 38920   Retired                       :113993  
##  2      : 22933   2      : 27761   Self-employed                 : 26092  
##  3      :  3319   3      :  3550   Unable to work                : 24512  
##  4      :   481   4      :   419   A homemaker                   : 21837  
##  5      :    62   5      :    72   Out of work for 1 year or more:  8253  
##  (Other):    30   (Other):    27   (Other)                       : 10160  
##      sex        
##  Male  :126122  
##  Female:197288  
##                 
##                 
##                 
##                 
## 
ggplot(set1, aes(physhlth, menthlth)) + geom_point(shape = 19, alpha = 1/2,aes(colour = sex)) + geom_smooth(color = "green") + facet_grid(.~sex) + theme_bw() + xlab("Physical health") + ylab("Mental health")
## `geom_smooth()` using method = 'gam' and formula 'y ~ s(x, bs = "cs")'

It’s interesting to note that the shapes of the smoothing curves are very similar. Both have the same numbers of peaks and valleys while the magnitude is found to be heigher in female than male for the corresponding value of physhlth.

Research quesion 2:

p <- ggplot(set1, aes( menthlth, sleptim1))
p + geom_point(shape = 19, alpha = 1/2) + geom_smooth(color = "green")  + facet_grid(.~employ1) + theme_bw()+ xlab("Mental health")+ylab("No of hours of sleep")
## `geom_smooth()` using method = 'gam' and formula 'y ~ s(x, bs = "cs")'

For both males and females there appears to be a slight negative correlation between mental health and sleep time. As the graph looks similar, We can conclude that there is no difference in mental health due to sleep in two gender.

Research quesion 3:

set2 <- select(brfss2013,  sex, menthlth, genhlth, sleptim1) %>% filter(sex != "NA") %>% 
  filter(menthlth != "NA") %>% filter(genhlth != "NA") %>% filter(sleptim1 <= 12)
summary(set2)
##      sex            menthlth           genhlth          sleptim1     
##  Male  :194633   Min.   : 0.000   Excellent: 84029   Min.   : 1.000  
##  Female:278819   1st Qu.: 0.000   Very good:156038   1st Qu.: 6.000  
##                  Median : 0.000   Good     :145412   Median : 7.000  
##                  Mean   : 3.317   Fair     : 62799   Mean   : 7.023  
##                  3rd Qu.: 2.000   Poor     : 25174   3rd Qu.: 8.000  
##                  Max.   :30.000                      Max.   :12.000
q <- ggplot(set2, aes(genhlth, sleptim1))
q + geom_boxplot() + scale_y_continuous(limits = c(4,10), breaks = 5:9) + facet_grid(. ~ sex) + xlab("Self reported general health") + 
  ylab ("No. of hours of sleep") + theme_bw()
## Warning: Removed 8906 rows containing non-finite values (stat_boxplot).

There appear to be no significant differences between men and women. However the self reported general health is found poor in women due to less hour of sleep.