library(ggplot2)
## Warning: package 'ggplot2' was built under R version 3.3.1
library(dplyr)
## Warning: package 'dplyr' was built under R version 3.3.1
load("brfss2013.RData")
Generabizability: The data does generalize to all population surveyed in the US as the poeple waas randomly sampled, but there is sampling bias probelm “Vlountery Response”, as the poeple who intent to responose and take the survey not fully representitve of the population.
Causality: we can not make causal inference as there is no random assignment.
Research quesion 1: Does the people with enough sleep hours tend to have better general health?
Here we are looking for association between genhlth and sleptim1 variables.
Research quesion 2: Is there a relationship between your marital status and time spent doing any sort of physical activity that would effect positivly to your health?
Here we are looking for association between marital and exeroft1 variables.
Research quesion 3: Does people with high income tend to have perfect weight that would reflect thier overall health condition positivly?
Research quesion 1:
brfss2013 %>% select(genhlth, sleptim1) %>% filter(!is.na(genhlth), !is.na(sleptim1))%>% group_by(genhlth) %>% summarise(total_hours_sleep = sum(sleptim1))
## # A tibble: 5 × 2
## genhlth total_hours_sleep
## <fctr> <int>
## 1 Excellent 609843
## 2 Very good 1121172
## 3 Good 1043788
## 4 Fair 448355
## 5 Poor 179471
From summary statistics table we see top 3 health categories (Excellent, very good and good) tend to have the more total hours sleeping than other 2 categries, next we invistigate more this phenomenan with a bar plot.
brfss2013 %>% filter(!is.na(genhlth)) %>% ggplot(aes(x= genhlth, fill= sleptim1), ylab("Total Hours Slept")) + geom_bar()
Here we see the plot confirm this association.
Research quesion 2:
brfss2013 %>% select(marital, exeroft1) %>% filter(!is.na(marital), !is.na(exeroft1))%>% group_by(marital) %>% summarise(total_hours_played = sum(exeroft1))
## # A tibble: 6 × 2
## marital total_hours_played
## <fctr> <int>
## 1 Married 23918499
## 2 Divorced 6183519
## 3 Widowed 5098976
## 4 Separated 854945
## 5 Never married 6935565
## 6 A member of an unmarried couple 1249341
Cohabitied and Separated people had performed the minimum amount of physical activites, while other groups did well compared to this to groups.
brfss2013 %>% filter(!is.na(marital)) %>% ggplot(aes(x= marital, fill= exeroft1)) + geom_bar() + theme(axis.text.x = element_text(angle = 90, hjust = 1))
The bar plot clearly confirms this finding.
Research quesion 3:
table <- brfss2013 %>% mutate(intweight = as.integer(as.character(weight2))) %>% select(income2, intweight ) %>% filter(!is.na(income2), !is.na(intweight)) %>% group_by(income2) %>% summarise(mean_weight = mean(intweight))
## Warning in eval(substitute(expr), envir, enclos): NAs introduced by
## coercion
The mean weight for each group is roughly equall and the data dosen’t tell a much, we could confirm this by a bar chart in the next plot.
ggplot(data = table, aes(x = income2, y = mean_weight)) + geom_bar(stat="identity") + theme(axis.text.x = element_text(angle = 90, hjust = 1)) + ylab("Mean Weight")