setwd("E:/STAT50")
getwd()
## [1] "E:/STAT50"
library(dplyr)
load ("brfss2013.RData")
Research question 1: Using the variable “sleptim1” and “marital”, determine the following:
Research question 1.1: Number of observations that are “NA” in the variable “sleptim1”. Research question 1.2: Number of observations having at most 5 hours of sleep. Research question 1.3: Number of observations having more than 5 hours of sleep but less than 11 hours of sleep. Research question 1.4: Number of observations having at least 11 hours of sleep. Research question 1.4: Number of observations having at most 5 hours of sleep are married.
Perform exploratory data analysis (EDA) that addresses each of the three research questions you outlined above. Your EDA should contain numerical summaries and visualizations. Each R output and plot should be accompanied by a brief interpretation.
Research question 1.1: Number of observations that are “NA” in the variable “sleptim1”.
str(select(brfss2013, sleptim1))
## 'data.frame': 491775 obs. of 1 variable:
## $ sleptim1: int NA 6 9 8 6 8 7 6 8 8 ...
brfss2013 %>%
group_by(sleptim1) %>%
filter(is.na(sleptim1)) %>%
summarize(count=n())
## # A tibble: 1 × 2
## sleptim1 count
## <int> <int>
## 1 NA 7387
Research question 1.2: Number of observations having at most 5 hours of sleep.
str(select(brfss2013, sleptim1))
## 'data.frame': 491775 obs. of 1 variable:
## $ sleptim1: int NA 6 9 8 6 8 7 6 8 8 ...
S <- brfss2013 %>%
group_by(sleptim1) %>%
filter(sleptim1 < c(5)) %>%
summarize(Frequency=n())
sum(S$Frequency)
## [1] 19062
Research question 1.3: Number of observations having more than 5 hours of sleep but less than 11 hours of sleep.
str(select(brfss2013, sleptim1))
## 'data.frame': 491775 obs. of 1 variable:
## $ sleptim1: int NA 6 9 8 6 8 7 6 8 8 ...
T <- brfss2013 %>%
group_by(sleptim1) %>%
filter(sleptim1 > c(5), sleptim1 < c(11)) %>%
summarize(Frequency=n())
sum(T$Frequency)
## [1] 425670
Research question 1.4: Number of observations having at least 11 hours of sleep.
str(select(brfss2013, sleptim1))
## 'data.frame': 491775 obs. of 1 variable:
## $ sleptim1: int NA 6 9 8 6 8 7 6 8 8 ...
U <- brfss2013 %>%
group_by(sleptim1) %>%
filter(sleptim1 > c(11)) %>%
summarize(Frequency=n())
sum(U$Frequency)
## [1] 5387
Research question 1.5: Number of observations having at most 5 hours of sleep are married.
str(select(brfss2013, sleptim1, marital))
## 'data.frame': 491775 obs. of 2 variables:
## $ sleptim1: int NA 6 9 8 6 8 7 6 8 8 ...
## $ marital : Factor w/ 6 levels "Married","Divorced",..: 2 1 1 1 1 2 1 3 1 1 ...
V <- brfss2013 %>%
group_by(sleptim1) %>%
filter(sleptim1 < c(5), marital=="Married") %>%
summarize(Frequency=n())
sum(V$Frequency)
## [1] 6935