Setup

Load packages

library(ggplot2)
## Warning: package 'ggplot2' was built under R version 3.3.1
library(dplyr)
## Warning: package 'dplyr' was built under R version 3.3.1

Load data

load("brfss2013.RData")

Part 1: Data

Generabizability: The data does generalize to all population surveyed in the US as the poeple waas randomly sampled, but there is sampling bias probelm “Vlountery Response”, as the poeple who intent to responose and take the survey not fully representitve of the population.

Causality: we can not make causal inference as there is no random assignment.


Part 2: Research questions

Research quesion 1: Does the people with enough sleep hours tend to have better general health?

Here we are looking for association between genhlth and sleptim1 variables.

Research quesion 2: Is there a relationship between your marital status and time spent doing any sort of physical activity that would effect positivly to your health?

Here we are looking for association between marital and exeroft1 variables.

Research quesion 3: Does people with high income tend to have perfect weight that would reflect thier overall health condition positivly?


Part 3: Exploratory data analysis

Research quesion 1:

brfss2013 %>% select(genhlth, sleptim1) %>% filter(!is.na(genhlth), !is.na(sleptim1))%>% group_by(genhlth) %>% summarise(total_hours_sleep = sum(sleptim1))
## # A tibble: 5 × 2
##     genhlth total_hours_sleep
##      <fctr>             <int>
## 1 Excellent            609843
## 2 Very good           1121172
## 3      Good           1043788
## 4      Fair            448355
## 5      Poor            179471

From summary statistics table we see top 3 health categories (Excellent, very good and good) tend to have the more total hours sleeping than other 2 categries, next we invistigate more this phenomenan with a bar plot.

brfss2013 %>% filter(!is.na(genhlth)) %>% ggplot(aes(x= genhlth, fill= sleptim1), ylab("Total Hours Slept")) + geom_bar()

Here we see the plot confirm this association.

Research quesion 2:

brfss2013 %>% select(marital, exeroft1) %>% filter(!is.na(marital), !is.na(exeroft1))%>% group_by(marital) %>% summarise(total_hours_played = sum(exeroft1))
## # A tibble: 6 × 2
##                           marital total_hours_played
##                            <fctr>              <int>
## 1                         Married           23918499
## 2                        Divorced            6183519
## 3                         Widowed            5098976
## 4                       Separated             854945
## 5                   Never married            6935565
## 6 A member of an unmarried couple            1249341

Cohabitied and Separated people had performed the minimum amount of physical activites, while other groups did well compared to this to groups.

brfss2013 %>% filter(!is.na(marital)) %>% ggplot(aes(x= marital, fill= exeroft1)) + geom_bar() + theme(axis.text.x = element_text(angle = 90, hjust = 1))

The bar plot clearly confirms this finding.

Research quesion 3:

table <- brfss2013 %>% mutate(intweight = as.integer(as.character(weight2))) %>% select(income2, intweight ) %>% filter(!is.na(income2), !is.na(intweight)) %>% group_by(income2) %>% summarise(mean_weight = mean(intweight))
## Warning in eval(substitute(expr), envir, enclos): NAs introduced by
## coercion

The mean weight for each group is roughly equall and the data dosen’t tell a much, we could confirm this by a bar chart in the next plot.

ggplot(data = table, aes(x = income2, y = mean_weight)) + geom_bar(stat="identity")  +  theme(axis.text.x = element_text(angle = 90, hjust = 1)) + ylab("Mean Weight")