Setup

Load packages

library(ggplot2)
library(dplyr)

Load data

load("~/4tiY2fqCQa-YmNn6gnGvzQ_1e7320c30a6f4b27894a54e2de50a805_brfss2013.RData")

Part 1: Data

Describe how the observations in the sample are collected, and the implications of this data collection method on the scope of inference (generalizability / causality)

Answer: This project was designed to measure behavioral risk factors for adults, 18 years and older, residing in the US. I think it uses random sampling since calling mobile and land line numbers result in randomly selecting individuals to participate. However, I don’t think the sample could be completely representative since adults who use mobiles restrict adults participating. For example, there a a lot of elder people who do not use mobiles, which makes elder people more difficult to reach and therefore less participative. Even land lines can be less participative for elder people when thinking of retirement homes. Also, there is no sample assignment. * * *

Part 2: Research questions

Research question 1: Is marital status correlated somehow to the number of days during the past 30 days the mental health was not good? Our emotional health or what we are feeling or going through in our emotional life can also be important for our health. I want to explore if your marital status has something to do with mental health.

Research question 2: Is there any relation or correlation between the time someone sleeps and the number of days, in the last 30 days, they did not have a good mental health? Not sleeping is for sure related to physical health, but to what extend is it related to mental health? We know because of mental health it can be difficult to sleep but can sleeping less cause bad mental health?

Research question 3: Is there any type of relation or correlation between the type of physical activity and the general health? We know physical activiy has a positive impact or is good for the overall heatlh. But, is there any type which is better o is related with better health?


Part 3: Exploratory data analysis

Research question 1:

ggplot(brfss2013, aes(marital, menthlth)) +
    geom_point(aes(marital, menthlth))
## Warning: Removed 8627 rows containing missing values (geom_point).

ggplot(data = brfss2013, aes(x = marital, fill = menthlth)) +
  geom_bar()

brfss2013 %>%
    group_by(marital) %>%
    summarise(mean_mentalhlth = mean(menthlth))
## Warning: `...` is not empty.
## 
## We detected these problematic arguments:
## * `needs_dots`
## 
## These dots only exist to allow future extensions and should be empty.
## Did you misspecify an argument?
## # A tibble: 7 x 2
##   marital                         mean_mentalhlth
##   <fct>                                     <dbl>
## 1 Married                                      NA
## 2 Divorced                                     NA
## 3 Widowed                                      NA
## 4 Separated                                    NA
## 5 Never married                                NA
## 6 A member of an unmarried couple              NA
## 7 NA                                           NA

Research question 2:

ggplot(brfss2013, aes(sleptim1, menthlth)) +
    geom_point(aes(sleptim1, menthlth))
## Warning: Removed 15159 rows containing missing values (geom_point).

ggplot(data = brfss2013, aes(x = sleptim1, fill = menthlth)) +
  geom_bar()
## Warning: Removed 7387 rows containing non-finite values (stat_count).

Research question 3:

ggplot(brfss2013, aes(exract11, genhlth)) +
    geom_point(aes(exract11, genhlth))

ggplot(data = brfss2013, aes(x = exract11, fill = genhlth)) +
  geom_bar()