Setup

Load packages

library(ggplot2)
library(dplyr)

Load data

load("brfss2013.RData")

Part 1: Data

In BRFSS, the observations in the sample were collected by telephone survey. There was used stratified random sampling. This data was not collected by the random assignment, that’s why it is observational study and generalizable.

As this is the observational study we can’t infer any causality. It is possible to infer causality when the study is experimental.


Part 2: Research questions

Research quesion 1:

The population of the U.S. is there a correlation between amount of sleep and the general health. In our busy life, it’s difficult to do sleep adequately. Though it’s an observational study I’m really interested in a causal relationship.

Id variable : sleptim1 :Amount of sleep genhlth : General health

Research quesion 2:

Is there any correlation between a level of income and overall life satisfaction? Further, any differences in this correlation between genders. I am interested to see if there is any noticeable trends between income level and reported satisfaction.

Id variables: satisfy: Satisfaction With Life income2: income Level sex: Respondents Sex

Research quesion 3:

Is there any relationship between prediabetes condition and BMI for males and females.An association between prediabetes and BMI helps us to better understand this more and more common disease in our societies.

Id variables: prediab1 : Prediabetes condition x_bmi5 : Body Mass Index sex: Respondents sex


Part 3: Exploratory data analysis

Research quesion 1:

q1 <- brfss2013 %>%
  select(genhlth , sleptim1) %>%
  #Filering na from general health
  filter(!is.na(genhlth) , sleptim1 < 24) %>%
  group_by(genhlth) %>% 
  summarise(mn_sleep = mean(sleptim1))
#Display
q1
## # A tibble: 5 x 2
##   genhlth   mn_sleep
##   <fct>        <dbl>
## 1 Excellent     7.19
## 2 Very good     7.10
## 3 Good          7.04
## 4 Fair          6.89
## 5 Poor          6.73
#Make plot of mean sleep grouped by general health
ggplot(data = q1 , aes(genhlth ,mn_sleep)) +
  geom_point()+
  labs(title="Mean Hours of Sleep for Each general Health",
       x="General Health rating", y="Mean Hours of Sleep")

It’s said that from the table and plot above, there does appear to be a relation between general health and time sleeping. Participate who reported being in excellent general health slept the longest time on average.

Research quesion 2:

q2 <- select(brfss2013, lsatisfy , sex, income2) %>% 
#Filtering na from the variable  
  filter(!is.na(lsatisfy), !is.na(sex), !is.na(income2)) 
q2 %>% group_by(lsatisfy) %>%    
  summarise(count=n()) 
## # A tibble: 4 x 2
##   lsatisfy          count
##   <fct>             <int>
## 1 Very satisfied     4290
## 2 Satisfied          4418
## 3 Dissatisfied        490
## 4 Very dissatisfied   134
q2 %>% group_by(income2) %>%  
  summarise(count=n())
## # A tibble: 8 x 2
##   income2           count
##   <fct>             <int>
## 1 Less than $10,000   864
## 2 Less than $15,000   985
## 3 Less than $20,000  1039
## 4 Less than $25,000  1044
## 5 Less than $35,000  1179
## 6 Less than $50,000  1274
## 7 Less than $75,000  1241
## 8 $75,000 or more    1706
q2 %>% group_by(sex) %>%  
  summarise(count=n()) 
## # A tibble: 2 x 2
##   sex    count
##   <fct>  <int>
## 1 Male    3418
## 2 Female  5914
#Make plot life satisfaction over income between male and felame
ggplot(data = q2, aes(x = lsatisfy , y = income2 )) +
  geom_count () +
  theme(axis.text.x = element_text(angle = 90, vjust = 0.5, hjust=1)) +
  facet_grid( ~  sex) +
  xlab("Satisfaction With Life") +
  ylab ("Income Level")

when we look at reported numbers of very satisfied and satisfied respondents, the number is trending up as we move up the income scale. This proves that more research could help identify possible causaul relationships.Finally, short of more robust analysis to identify causation, I believe this survey would benefit from further segementation of those who earn more than 75,000 to see how even higher earners fare in terms of satisfaction level of males and females.

Research quesion 3:

q3 <- brfss2013 %>% 
  filter(prediab1 != "NA" ,X_bmi5 != "NA") %>% 
  mutate(bmi = X_bmi5 / 100)
#Making plot of prediabetes and BMI
ggplot(q3, aes(x = prediab1, y = bmi)) + 
  geom_boxplot() +
  theme(axis.text.x = element_text(angle = 90, vjust = 0.5, hjust=1)) +
  facet_wrap(~sex) + 
  xlab("Prediabetes Condition") +
  ylab("BMI")

In this case for this exploratory data analysis I can’t concern any relationship between prediabetes and BMI.though from this plot we can see that when the prediabetes condition is active the the bmi is slightly increases.In case of female same thing happening.