Setup

Load packages

library(tidyverse)

Part 1: Data

Data provided by course instructor.
The Behavioral Risk Factor Surveillance System (BRFSS) is a collaborative project between all of the states in the United States (US) and participating US territories and the Centers for Disease Control and Prevention (CDC). The BRFSS goal is to collect uniform, state-specific data on preventive health practices and risk behaviors that are linked to chronic diseases, injuries, and preventable infectious diseases that affect the adult population. (from brfss codebook provided by instructor)

Participant were chosen by random. They were interviewed over phone.


Part 2: Research questions

Research quesion 1:

How sleep time vary in differnt group?
In sex group, we will see is there any difference in sleep time between men and women.
In the same way we will see is there any difference in sleep time in the following groups:
Marital status, education, income, employment and internet users.

varaible will be explored:
sleptim1 - On average, How many hours of sleep do you get in a 24-hour period?
sex - Respondents Sex
marital - Marital Status
educa - Education level
income2 - Income level
employ1 - Employment Status
internet - Internet use in the past 30 days?

Research quesion 2:

How general health differ across different educational level?
We will also see how number of the days physical health not good differ at educational level.

Variable to be explored.
educa - Education level.
genhlth - General Health.
physhlth - Number of days physical health not good.

Research quesion 3:

How stroke occureance differ in sex and income level?
variable to be explored:
cvdstrk3 - Ever diagnosed with a stroke
sex - Respondents sex
income2 - Income Level


Part 3: Exploratory data analysis

Research Question 01

First we will create a dataframe(df) that contain all columns we will be working on.

df <- select(brfss2013, 
             sleptim1,
             sex,
             marital,
             educa,
             income2,
             employ1,
             internet,
             genhlth,
             physhlth,
             cvdstrk3)

Research quesion 1:

Sleep summary statistics

sleep time vs marital status
sleep_marital_summary <- select(df, sleptim1, marital) %>% 
    filter(sleptim1 <= 24 &
               marital != 'NA') %>% 
    group_by(marital) %>% 
    summarise(sleep_time_mean = mean(sleptim1), 
              sd = sd(sleptim1),
              median = median(sleptim1), 
              min = min(sleptim1), 
              max = max(sleptim1), 
              Q1 = quantile(sleptim1, probs = 0.25), 
              Q3 = quantile(sleptim1, probs = 0.75),
              IQR = IQR(sleptim1))

sleep_marital_summary
## # A tibble: 6 x 9
##   marital       sleep_time_mean    sd median   min   max    Q1    Q3   IQR
##   <fct>                   <dbl> <dbl>  <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
## 1 Married                  7.08  1.31      7     1    24     6     8     2
## 2 Divorced                 6.91  1.62      7     1    24     6     8     2
## 3 Widowed                  7.22  1.61      7     1    24     6     8     2
## 4 Separated                6.69  1.84      7     1    24     6     8     2
## 5 Never married            7.00  1.58      7     1    24     6     8     2
## 6 A member of …            6.96  1.47      7     1    22     6     8     2
sleep time vs education level
options(width = 100)

sleep_education_summary <- select(df, sleptim1, educa) %>% 
    filter(sleptim1 <= 24 &
               educa != 'NA') %>% 
    group_by(educa) %>% 
    summarise(sleep_time_mean = mean(sleptim1), 
              sd = sd(sleptim1),
              median = median(sleptim1), 
              min = min(sleptim1), 
              max = max(sleptim1), 
              Q1 = quantile(sleptim1, probs = 0.25), 
              Q3 = quantile(sleptim1, probs = 0.75),
              IQR = IQR(sleptim1)) %>% 
    print()
## # A tibble: 6 x 9
##   educa                                  sleep_time_mean    sd median   min   max    Q1    Q3   IQR
##   <fct>                                            <dbl> <dbl>  <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
## 1 Never attended school or only kinderg…            7.02  2.31      7     1    24     6     8     2
## 2 Grades 1 through 8 (Elementary)                   7.09  1.95      7     1    24     6     8     2
## 3 Grades 9 though 11 (Some high school)             6.99  1.95      7     1    24     6     8     2
## 4 Grade 12 or GED (High school graduate)            7.05  1.60      7     1    24     6     8     2
## 5 College 1 year to 3 years (Some colle…            6.99  1.45      7     1    24     6     8     2
## 6 College 4 years or more (College grad…            7.10  1.20      7     1    24     6     8     2
Sleep time vs Income level
options(width = 100)
sleep_income_summary <- select(df, sleptim1, income2) %>% 
    filter(sleptim1 <= 24 &
               income2 != 'NA') %>% 
    group_by(income2) %>% 
    summarise(sleep_time_mean = mean(sleptim1), 
              sd = sd(sleptim1),
              median = median(sleptim1), 
              min = min(sleptim1), 
              max = max(sleptim1), 
              Q1 = quantile(sleptim1, probs = 0.25), 
              Q3 = quantile(sleptim1, probs = 0.75),
              IQR = IQR(sleptim1))

sleep_income_summary
## # A tibble: 8 x 9
##   income2           sleep_time_mean    sd median   min   max    Q1    Q3   IQR
##   <fct>                       <dbl> <dbl>  <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
## 1 Less than $10,000            6.86  2.06      7     1    24     6     8     2
## 2 Less than $15,000            6.95  1.90      7     0    22     6     8     2
## 3 Less than $20,000            7.03  1.77      7     1    24     6     8     2
## 4 Less than $25,000            7.04  1.62      7     1    24     6     8     2
## 5 Less than $35,000            7.07  1.47      7     1    24     6     8     2
## 6 Less than $50,000            7.07  1.35      7     1    24     6     8     2
## 7 Less than $75,000            7.05  1.23      7     1    24     6     8     2
## 8 $75,000 or more              7.05  1.12      7     1    24     6     8     2
Sleep time vs Employment status
sleep_employment_summary <- select(df, sleptim1, employ1) %>% 
    filter(sleptim1 <= 24 &
               employ1 != 'NA') %>% 
    group_by(employ1) %>% 
    summarise(sleep_time_mean = mean(sleptim1), 
              sd = sd(sleptim1),
              median = median(sleptim1), 
              min = min(sleptim1), 
              max = max(sleptim1), 
              Q1 = quantile(sleptim1, probs = 0.25), 
              Q3 = quantile(sleptim1, probs = 0.75),
              IQR = IQR(sleptim1))

sleep_employment_summary
## # A tibble: 8 x 9
##   employ1                          sleep_time_mean    sd median   min   max    Q1    Q3   IQR
##   <fct>                                      <dbl> <dbl>  <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
## 1 Employed for wages                          6.89  1.21      7     1    24     6     8     2
## 2 Self-employed                               7.08  1.26      7     1    22     6     8     2
## 3 Out of work for 1 year or more              6.91  1.85      7     1    24     6     8     2
## 4 Out of work for less than 1 year            6.98  1.67      7     0    24     6     8     2
## 5 A homemaker                                 7.19  1.46      7     1    24     6     8     2
## 6 A student                                   7.07  1.38      7     1    24     6     8     2
## 7 Retired                                     7.35  1.47      7     1    24     6     8     2
## 8 Unable to work                              6.75  2.31      6     1    24     5     8     3

sleep time plot

Research Question 02