Load packages

library(ggplot2)
library(dplyr)
library(gridExtra)
load("brfss2013.RData")

Part 1: Data

The data was collected as part of The Behavioral Risk Factor Surveillance System (BRFSS). This project was initiated in 1984 with only 15 states, in 2001 the number of states incremented to 50. The data was collected by telephone interviews from non-institutionalized adult population, aged 18 years or older, who reside in the US [1]. The objective is to identify preventive practices and risk behaviors linked to chronic diseases [1].


Part 2: Research questions

1. How does alcohol consumption correlates with quantity of sleep for people 18 years or older who reside in the U.S.? An answer to this question can help to improve our understanding of the effects that alcohol consumption has in the quantity of sleep; this will provide a baseline with which to compare associations of quantity of sleep with more variables.

2. What is the effect of the interaction bewtween alcohol and tobacco consumption in the quantity of sleep for people 18 years or older who reside in the U.S.? Building on the previous question, measuring the effect of alcohol and tobacco consumption on quantity of sleep can provide insights about how these two factors interact and possibly disturb the quantity of sleep.

3. How does number of children in a household affects the hours of sleep for people 18 years or older who reside in the U.S.? The lack of sleep can have a big impact in the quality of life of a person; the number of children in a household can be a determinant factor that diminish the hours of sleep.


Part 3: Exploratory data analysis

Research quesion 1:

Figure 1. a) Histogram of Drinks on a single occasion, b) histogram of hours of sleep and c) scatterplot of drinks on a single occasion vs hours of sleep.

##     maxdrnks         sleptim1      
##  Min.   : 1.00    Min.   :  0.000  
##  1st Qu.: 1.00    1st Qu.:  6.000  
##  Median : 2.00    Median :  7.000  
##  Mean   : 3.26    Mean   :  7.052  
##  3rd Qu.: 4.00    3rd Qu.:  8.000  
##  Max.   :97.00    Max.   :450.000  
##  NA's   :264958   NA's   :7387

Figure 2. Summary statistics of max number of drinks and hours of sleep.

Most of the people tend to have an average maximum number of drinks per occasion of 3.26 (Figure 2), in Figure 1 a) it is clear that the data is right skewed, with most of the observations falling on the left side of the plot. Hours of sleep is normally distributed, with a mean of 7.05. Note that in Figure 1 a) and b) only a subset of the observations was plotted avoiding outliers.

Research quesion 2:

Figure 3. Stacked bar charts of hours of sleep and whether or not the person smoked at least 100 cigarettes.

##    Yes     No   NA's 
## 215201 261654  14920

Figure 4. Summary statistics of whether or not the person smoked at least 100 cigarettes.

The quantity of people in the available sample that smoked at least 10 cigarettes is less than those that don’t (Figure 4)

Research quesion 3:

Figure 5. a) Number of children in household, b)Number of children in household vs Hours of sleep. On the analysis of the relation of number of children and hours of sleep, we deleted some outliers with the SD rule; however, this is not the best approach to handle outliers and in the next report we will try better methods like LOF or by distances with Kmeans. The distribution of children in households is right skewed (Figure 5 a)), with most of the household with 0 or 1 children, we found household with more than 7 children, one possible explanation is that different families are living in the same house.

In the scatterplot of Figure 5. b) we observed a weak but negative tendency of sleep less hours as the number of children in the household increases. We will build upon this negative correlation in later reports.

Bibliography

  1. http://www.cdc.gov/brfss/annual_data/2013/pdf/Overview_2013.pdf