Setup

setwd("E:/STAT50")
getwd()
## [1] "E:/STAT50"

Load Packages

library(dplyr)

Load data

load ("brfss2013.RData")

Part 1: Research questions

Research question 1: Using the variable “sleptim1” and “marital”, determine the following:

Research question 1.1: Number of observations that are “NA” in the variable “sleptim1”. Research question 1.2: Number of observations having at most 5 hours of sleep. Research question 1.3: Number of observations having more than 5 hours of sleep but less than 11 hours of sleep. Research question 1.4: Number of observations having at least 11 hours of sleep. Research question 1.4: Number of observations having at most 5 hours of sleep are married.

Part 3: Exploratory data analysis

Perform exploratory data analysis (EDA) that addresses each of the three research questions you outlined above. Your EDA should contain numerical summaries and visualizations. Each R output and plot should be accompanied by a brief interpretation.

Research question 1.1: Number of observations that are “NA” in the variable “sleptim1”.

str(select(brfss2013, sleptim1))
## 'data.frame':    491775 obs. of  1 variable:
##  $ sleptim1: int  NA 6 9 8 6 8 7 6 8 8 ...
brfss2013 %>%
  group_by(sleptim1) %>%
  filter(is.na(sleptim1)) %>%
  summarize(count=n())
## # A tibble: 1 × 2
##   sleptim1 count
##      <int> <int>
## 1       NA  7387

Research question 1.2: Number of observations having at most 5 hours of sleep.

str(select(brfss2013, sleptim1))
## 'data.frame':    491775 obs. of  1 variable:
##  $ sleptim1: int  NA 6 9 8 6 8 7 6 8 8 ...
S <- brfss2013 %>%
  group_by(sleptim1) %>%
  filter(sleptim1 < c(5)) %>%
  summarize(Frequency=n())

sum(S$Frequency)
## [1] 19062

Research question 1.3: Number of observations having more than 5 hours of sleep but less than 11 hours of sleep.

str(select(brfss2013, sleptim1))
## 'data.frame':    491775 obs. of  1 variable:
##  $ sleptim1: int  NA 6 9 8 6 8 7 6 8 8 ...
T <- brfss2013 %>%
  group_by(sleptim1) %>%
  filter(sleptim1 > c(5), sleptim1 < c(11)) %>%
  summarize(Frequency=n())

sum(T$Frequency)
## [1] 425670

Research question 1.4: Number of observations having at least 11 hours of sleep.

str(select(brfss2013, sleptim1))
## 'data.frame':    491775 obs. of  1 variable:
##  $ sleptim1: int  NA 6 9 8 6 8 7 6 8 8 ...
U <- brfss2013 %>%
  group_by(sleptim1) %>%
  filter(sleptim1 > c(11)) %>%
  summarize(Frequency=n())

sum(U$Frequency)
## [1] 5387

Research question 1.5: Number of observations having at most 5 hours of sleep are married.

str(select(brfss2013, sleptim1, marital))
## 'data.frame':    491775 obs. of  2 variables:
##  $ sleptim1: int  NA 6 9 8 6 8 7 6 8 8 ...
##  $ marital : Factor w/ 6 levels "Married","Divorced",..: 2 1 1 1 1 2 1 3 1 1 ...
V <- brfss2013 %>%
  group_by(sleptim1) %>%
  filter(sleptim1 < c(5), marital=="Married") %>%
  summarize(Frequency=n())

sum(V$Frequency)
## [1] 6935