Setup

Load packages

library(ggplot2)
library(dplyr)

Load data

Make sure your data and R Markdown files are in the same directory. When loaded your data file will be called brfss2013. Delete this note when before you submit your work.


Part 1: Data

load("~/R/brfss2013.RData")

Part 2: Research questions

Research quesion 1:Do weight and height impact to the good/bad health

Research quesion 2:Can BMI indicate the good/bad health

Research quesion 3: Then, what really impact general health


Part 3: Exploratory data analysis

Research quesion 1: Do weight impact to the good/bad health

First, plot the density graph of weight

ggplot(data = brfss2013, aes(x=wtkg3/100, color=genhlth))+geom_density()+xlim(0,200)
## Warning: Removed 20871 rows containing non-finite values (stat_density).

And also do the summarise grouped by general health

brfss2013 %>%
  group_by(genhlth) %>%
  summarise(mean_weight = mean(wtkg3/100, na.rm = T), sd_weight=sd(wtkg3/100, na.rm=T), n = n())
## Source: local data frame [6 x 4]
## 
##     genhlth mean_weight sd_weight      n
##      (fctr)       (dbl)     (dbl)  (int)
## 1 Excellent    74.11690  16.36212  85482
## 2 Very good    78.65348  18.13169 159076
## 3      Good    82.44996  20.72351 150555
## 4      Fair    84.39674  23.37736  66726
## 5      Poor    84.05489  25.23718  27951
## 6        NA    78.14416  20.62593   1985
brfss2013 %>%
  group_by(genhlth) %>%
  summarise(mean_height = mean(htm4/100, na.rm = T), sd_height=sd(htm4/100, na.rm=T), n = n())
## Source: local data frame [6 x 4]
## 
##     genhlth mean_height sd_height      n
##      (fctr)       (dbl)     (dbl)  (int)
## 1 Excellent    1.702950 0.1032375  85482
## 2 Very good    1.699248 0.1170719 159076
## 3      Good    1.690567 0.1216125 150555
## 4      Fair    1.677626 0.1066154  66726
## 5      Poor    1.676436 0.1079633  27951
## 6        NA    1.679652 0.1195868   1985

the result show that the good health people tend to have slighly lower weight than those that poor health

Research quesion 2: Can BMI indicate the good/bad health

First, calculate and round the BMI value

brfss2013 <- brfss2013 %>%
  mutate( bmi = round(wtkg3*100/htm4^2,0))

Summarise and look at it again

ggplot(data = brfss2013, aes(x=bmi, color=genhlth))+geom_density()+xlim(0,50)
## Warning: Removed 26631 rows containing non-finite values (stat_density).

brfss2013 %>%
  group_by(genhlth) %>%
  summarise(mean_bmi = mean(bmi, na.rm = T), sd_bmi=sd(bmi, na.rm=T), n = n())
## Source: local data frame [6 x 4]
## 
##     genhlth mean_bmi   sd_bmi      n
##      (fctr)    (dbl)    (dbl)  (int)
## 1 Excellent 25.39106 4.550576  85482
## 2 Very good 27.06575 5.202269 159076
## 3      Good 28.69913 6.274554 150555
## 4      Fair 29.89762 7.472695  66726
## 5      Poor 29.83684 8.342387  27951
## 6        NA 27.63672 6.964676   1985

Again! the BMI indicator is slightly differnt between good/bad health group

Research quesion 3: Then, what really impact general health Based on these data, what group of people tend to have better health condition. I start by setting the score of healt from poor to excellent , 1 to 5

brfss2013 <- brfss2013 %>%
  mutate(health_score = ifelse(is.na(genhlth),genhlth,ifelse(genhlth=="Poor",1,ifelse(genhlth=="Fair",2,ifelse(genhlth=="Good", 3, ifelse(genhlth=="Very good",4,5))))))

Then take a look at each parameter one by one…… start from gender and on..

ggplot(data = brfss2013, aes(x=sex, fill=genhlth))+geom_bar()

Smoked At Least 100 Cigarettes

ggplot(data = brfss2013, aes(x=smoke100, fill=genhlth))+geom_bar()

Avg Alcoholic Drinks Per Day In Past 30

ggplot(data = brfss2013, aes(x=avedrnk2, fill=genhlth))+geom_bar()+xlim(0,10)
## Warning: Removed 263073 rows containing non-finite values (stat_count).

exerany2: Exercise In Past 30 Days

ggplot(data = brfss2013, aes(x=exerany2, fill=genhlth))+geom_bar()

From these parameter, the most impact is doing excercise!