library(ggplot2)
library(dplyr)
Make sure your data and R Markdown files are in the same directory. When loaded your data file will be called brfss2013
. Delete this note when before you submit your work.
load("~/R/brfss2013.RData")
Research quesion 1:Do weight and height impact to the good/bad health
Research quesion 2:Can BMI indicate the good/bad health
Research quesion 3: Then, what really impact general health
Research quesion 1: Do weight impact to the good/bad health
First, plot the density graph of weight
ggplot(data = brfss2013, aes(x=wtkg3/100, color=genhlth))+geom_density()+xlim(0,200)
## Warning: Removed 20871 rows containing non-finite values (stat_density).
And also do the summarise grouped by general health
brfss2013 %>%
group_by(genhlth) %>%
summarise(mean_weight = mean(wtkg3/100, na.rm = T), sd_weight=sd(wtkg3/100, na.rm=T), n = n())
## Source: local data frame [6 x 4]
##
## genhlth mean_weight sd_weight n
## (fctr) (dbl) (dbl) (int)
## 1 Excellent 74.11690 16.36212 85482
## 2 Very good 78.65348 18.13169 159076
## 3 Good 82.44996 20.72351 150555
## 4 Fair 84.39674 23.37736 66726
## 5 Poor 84.05489 25.23718 27951
## 6 NA 78.14416 20.62593 1985
brfss2013 %>%
group_by(genhlth) %>%
summarise(mean_height = mean(htm4/100, na.rm = T), sd_height=sd(htm4/100, na.rm=T), n = n())
## Source: local data frame [6 x 4]
##
## genhlth mean_height sd_height n
## (fctr) (dbl) (dbl) (int)
## 1 Excellent 1.702950 0.1032375 85482
## 2 Very good 1.699248 0.1170719 159076
## 3 Good 1.690567 0.1216125 150555
## 4 Fair 1.677626 0.1066154 66726
## 5 Poor 1.676436 0.1079633 27951
## 6 NA 1.679652 0.1195868 1985
the result show that the good health people tend to have slighly lower weight than those that poor health
Research quesion 2: Can BMI indicate the good/bad health
First, calculate and round the BMI value
brfss2013 <- brfss2013 %>%
mutate( bmi = round(wtkg3*100/htm4^2,0))
Summarise and look at it again
ggplot(data = brfss2013, aes(x=bmi, color=genhlth))+geom_density()+xlim(0,50)
## Warning: Removed 26631 rows containing non-finite values (stat_density).
brfss2013 %>%
group_by(genhlth) %>%
summarise(mean_bmi = mean(bmi, na.rm = T), sd_bmi=sd(bmi, na.rm=T), n = n())
## Source: local data frame [6 x 4]
##
## genhlth mean_bmi sd_bmi n
## (fctr) (dbl) (dbl) (int)
## 1 Excellent 25.39106 4.550576 85482
## 2 Very good 27.06575 5.202269 159076
## 3 Good 28.69913 6.274554 150555
## 4 Fair 29.89762 7.472695 66726
## 5 Poor 29.83684 8.342387 27951
## 6 NA 27.63672 6.964676 1985
Again! the BMI indicator is slightly differnt between good/bad health group
Research quesion 3: Then, what really impact general health Based on these data, what group of people tend to have better health condition. I start by setting the score of healt from poor to excellent , 1 to 5
brfss2013 <- brfss2013 %>%
mutate(health_score = ifelse(is.na(genhlth),genhlth,ifelse(genhlth=="Poor",1,ifelse(genhlth=="Fair",2,ifelse(genhlth=="Good", 3, ifelse(genhlth=="Very good",4,5))))))
Then take a look at each parameter one by one…… start from gender and on..
ggplot(data = brfss2013, aes(x=sex, fill=genhlth))+geom_bar()
Smoked At Least 100 Cigarettes
ggplot(data = brfss2013, aes(x=smoke100, fill=genhlth))+geom_bar()
Avg Alcoholic Drinks Per Day In Past 30
ggplot(data = brfss2013, aes(x=avedrnk2, fill=genhlth))+geom_bar()+xlim(0,10)
## Warning: Removed 263073 rows containing non-finite values (stat_count).
exerany2: Exercise In Past 30 Days
ggplot(data = brfss2013, aes(x=exerany2, fill=genhlth))+geom_bar()
From these parameter, the most impact is doing excercise!