Setup

Load packages

library(ggplot2)
library(dplyr)

Load data

Make sure your data and R Markdown files are in the same directory. When loaded your data file will be called brfss2013. Delete this note when before you submit your work.

load("brfss2013.RData")

Part 1: Data

The study conducts both landline-telephone survey and cellular telephone-based survey. Adults from landline-telephone survey are selected randomly. However, the sample might be biased. Some factors should still be taken into consideration. The first is when are those studies conducted? Different groups of people tend to stay at home at different time. For example, if the survey was conducted during day time, then it is more likely that the obervant will be someone who doesn’t have a job such as children or the elderly. Whereas there would be more young adults at night time. The second thing to think about is the willingness of the respondents. Since the survey is conducted by phone, people have the choice to decide whether they are going to participate. People who take the survey might appeared not so busy or are more kind to help out the survey. After all, kindness might be a confounding variable for personal health.


Part 2: Research questions

Research quesion 1: The first interesting question we might ask is that if there’s a correlated relationship between sleep habbit and general health condition. We might also want to see if there’s a relationship between exercise habbit and sleeping hours to see that Research quesion 2: We will then investigate how one’s personal financial situation would effect one’s general health. Here, we will assume that health care access has a positive relation with personal financial situation. We will compare the health level within people who have or have not access to health care coverage. Furthermore, we would be studying general health of those who couldn’t afford to see a doctor. Finally, we would see the relationship between income and general health. Research quesion 3: We would like to see if people who have cholesterol awareness tend to exercise more. We would compare minutes of their walking, running, jogging, and swimming to whether they have been informed of any kind of health impairment. We will then end the analysis with the relationship between marital status with health condition.


Part 3: Exploratory data analysis

Research quesion 1:

ggplot(data = brfss2013, aes(x = sleptim1, y=genhlth)) +
  xlim(0,24)+
  geom_point()
## Warning: Removed 7389 rows containing missing values (geom_point).

We can see from the relationship between sleep time and general health condition that there is no apparent correlation. No matter what one’s sleep time is, it seems that their health condition has not been affected. We will look closer by examining their frequency.

excehlth <- brfss2013 %>%
  filter(genhlth == "Excellent")
ggplot(data = excehlth, aes(x =sleptim1)) +
  geom_bar()
## Warning: Removed 660 rows containing non-finite values (stat_count).

verygoodhlth <- brfss2013 %>%
  filter(genhlth == "Very good")
ggplot(data = verygoodhlth, aes(x =sleptim1)) +
  geom_bar()
## Warning: Removed 1243 rows containing non-finite values (stat_count).

goodhlth <- brfss2013 %>%
  filter(genhlth == "Good")
ggplot(data = goodhlth, aes(x =sleptim1)) +
  geom_bar()
## Warning: Removed 2256 rows containing non-finite values (stat_count).

fairhlth <- brfss2013 %>%
  filter(genhlth == "Fair")
ggplot(data = fairhlth, aes(x =sleptim1)) +
  geom_bar()
## Warning: Removed 1714 rows containing non-finite values (stat_count).

poorhlth <- brfss2013 %>%
  filter(genhlth == "Poor")
ggplot(data = poorhlth, aes(x =sleptim1)) +
  geom_bar()
## Warning: Removed 1312 rows containing non-finite values (stat_count).

We can see from the chart that people in different health condition tend to have the same distribution of sleeping time. The majority of the sleeping time falls between six to eight hours, which is quite common. The other thing to notice is that people who have excellent health condition seem to have a more concentrated distribution than the ones who have poor health condition. We might postulate that people with excellent health have more discipline on their daily lives. Research quesion 2:

ggplot(data = brfss2013, aes(x =hlthpln1, y = genhlth)) +
  geom_count()

Again, we focus each of the health care access category.

yeshlthpln <- brfss2013 %>%
  filter(hlthpln1 == "Yes")
ggplot(data = yeshlthpln, aes(x =genhlth)) +
  geom_bar()

nohlthpln <- brfss2013 %>%
  filter(hlthpln1 == "No")
ggplot(data = nohlthpln, aes(x =genhlth)) +
  geom_bar()

We can see that there is an obvious right skewness in the distribution of the plot for people who have access to health care coverage. It can just be said that the majority of people who have access have “very good” health compared to “good” health for people who don’t have access. Now we compare people’s health directly to their income.

ggplot(data = brfss2013, aes(x =income2, y = genhlth)) +
  geom_count()

It’s a bit hard to tell any information from the chart except that the lower right portion might have a bigger portion. We analyze each of the health condition to see their income level.

ggplot(data = excehlth, aes(x =income2)) +
  geom_bar()

ggplot(data = verygoodhlth, aes(x =income2)) +
  geom_bar()

ggplot(data = goodhlth, aes(x =income2)) +
  geom_bar()

ggplot(data = fairhlth, aes(x =income2)) +
  geom_bar()

ggplot(data = poorhlth, aes(x =income2)) +
  geom_bar()

Now it is obvious that people who have higher income level tend to have better health condition. We also noticed that there are less sample in lower income. That might be due to they don’t have much access through cell phone and landline phone. It might also be the reason that they spend much time struggling through life and don’t have spare time to take the survey. Research quesion 3:

ggplot(data = brfss2013, aes(x =toldhi2, y=exeroft1)) +
  geom_count()
## Warning: Removed 164165 rows containing non-finite values (stat_sum).

yestoldhi <- brfss2013 %>%
  filter(toldhi2 == "Yes")
ggplot(data = yestoldhi, aes(x =exeroft1)) +
  geom_bar()
## Warning: Removed 65837 rows containing non-finite values (stat_count).

notoldhi <- brfss2013 %>%
  filter(toldhi2 == "No")
ggplot(data = notoldhi, aes(x =exeroft1)) +
  geom_bar()
## Warning: Removed 71222 rows containing non-finite values (stat_count).

We see that there is no much difference between the two charts and might conclude that people’s exercise behavior won’t be affected by their awareness of their blood cholesterol level.

ggplot(data = brfss2013, aes(x =marital, y=genhlth)) +
  geom_count()

married <- brfss2013 %>%
  filter(marital == "Married")
ggplot(data = married, aes(x =genhlth)) +
  geom_bar()

divorced <- brfss2013 %>%
  filter(marital == "Divorced")
ggplot(data = divorced, aes(x =genhlth)) +
  geom_bar()

separated <- brfss2013 %>%
  filter(marital == "Separated")
ggplot(data = separated, aes(x =genhlth)) +
  geom_bar()

widowed <- brfss2013 %>%
  filter(marital == "Widowed")
ggplot(data = widowed, aes(x =genhlth)) +
  geom_bar()

neverm <- brfss2013 %>%
  filter(marital == "Never married")
ggplot(data = neverm, aes(x =genhlth)) +
  geom_bar()

From the charts above we can see that there is not obvious relationship between marital status and health condition. The only noticable characteristics from the charts is that there are right skewness in married, divorced, never married status. People who are widowed or separated have a more normal distribution. We might postulate that people in an entangled and unresolved relationship might tend to be less healthy. It might be psychological reason that cause a lower health condition. But we cannot be certain about it.