setwd("D:/Caraga PeerProject/") library(ggplot2)
library(dplyr)load("brfss2013.RData")The Behavioral Risk Factor Surveillance System (BRFSS) is the nation’s premier system of health-related telephone surveys that collect state data about U.S. residents regarding their health-related risk behaviors, chronic health conditions, and use of preventive services. Established in 1984 with 15 states, BRFSS now collects data in all 50 states as well as the District of Columbia and three U.S. territories. BRFSS completes more than 400,000 adult interviews each year, making it the largest continuously conducted health survey system in the world. In the 2013 Survey it totals to 491,775 respondents with 330 variables. These are some of the characteristics of the brfss2013 data:
Answer: Our data is a survey conducted by BRFSS regarding the health status of the adult US citizens and their health related issues. This data “brfss2013” provides the data of 491775 patients and a total of 330 variables are used as a metrics for evaluation. Health characteristics estimated from the BRFSS pertain to the non-institutionalized adult population, aged 18 years or older, who reside in the US. In 2013, additional question sets were included as optional modules to provide a measure for several childhood health and wellness indicators, including asthma prevalence for people aged 17 years or younger.
Answer: This data is useful in studying and analysing many factors that are responsible in determining US citizens health related concerns and to have idea for the health welfare of the community.
This data is conducted through survey. The survey was taken using landline telephone and cellular telephone.
During landline telephone survey, interviewers collected data from a randomly selected adult in a household. Disproportionate stratified sampling (DSS) has been used for the landline sample.
During cellular telephone survey, interviewers collected data from an adult who participates by using a cellular telephone and resided in a private residence or college housing. The sample is randomly generated from a sampling frame of confirmed cellular area.
While talking about scope of inference, the BRFSS study is of observational nature which utilized non-probability sampling. Hence, the collected data is limited only for descriptive statistics like trend analysis.
Research Questions:
Research quesion 1: Exploratory Data Analysis on the variable sleptim1 in terms of the following:
1.1 What are its statistics using the function summary in R?
1.2 Provide statistics using the function summary without NA’s and data with at most 10 hours of sleep.
Research quesion 2: What anlysis can you share on the Perception of others to the Depressive Disorder of the Respondents with those having less than 6 hours of sleep on the average?
Research quesion 3: What insights can you provide in comparing between having less than 6 hours of sleep and having 6 to 10 hours of sleep that were perceived with depression disorder?
Research quesion 1:
Using the function summary in R by consideration all the obsevations of the sleptim1 variable
summary(brfss2013$sleptim1)## Min. 1st Qu. Median Mean 3rd Qu. Max. NA's
## 0.000 6.000 7.000 7.052 8.000 450.000 7387
Using the function summary in R without the NA’s
NoNaSleep<-brfss2013%>%
filter(!is.na(sleptim1))
summary(NoNaSleep$sleptim1)## Min. 1st Qu. Median Mean 3rd Qu. Max.
## 0.000 6.000 7.000 7.052 8.000 450.000
Using the function summary in R for at most 10 hours average of sleep in the variable sleptim1
Atmost10hours<-NoNaSleep%>%
filter(sleptim1<11)
summary(Atmost10hours$sleptim1)## Min. 1st Qu. Median Mean 3rd Qu. Max.
## 0.000 6.000 7.000 6.976 8.000 10.000
With the results above, it shows that all three results give the same results except for the Maximum. This shows that only few obsrvations having beyond beyoud 10 hours of sleep on the average.
Research quesion 2:
sl_dep5 <- NoNaSleep %>%
filter(!is.na(sleptim1),!is.na(addepev2),sleptim1<6)%>%
group_by(addepev2)%>%
summarise(count=n())## `summarise()` ungrouping output (override with `.groups` argument)
sl_dep5## # A tibble: 2 x 2
## addepev2 count
## <fct> <int>
## 1 Yes 17828
## 2 No 34275
ggplot(data=sl_dep5,aes(x=addepev2,y=count))+geom_bar(stat="identity",color='blue',fill='green')+xlab("Depressive Disorder for people having at most 5 hours of average sleep")+ylab("Number of US citizens ")(17828/(17828+34275))## [1] 0.3421684
The above results show that 34.23% are those having depression disorder as perceived others having less than 6 hours of sleep on the average. This means that 1 out of 3 of those who sleep less than 6 hours on the average is perceived by others having depression disorder.
Research quesion 3:
sl_dep6 <- Atmost10hours %>%
filter(!is.na(sleptim1),!is.na(addepev2),sleptim1>5)%>%
group_by(addepev2)%>%
summarise(count=n())## `summarise()` ungrouping output (override with `.groups` argument)
sl_dep6## # A tibble: 2 x 2
## addepev2 count
## <fct> <int>
## 1 Yes 73771
## 2 No 350259
ggplot(data=sl_dep6,aes(x=addepev2,y=count))+geom_bar(stat="identity",color='blue',fill='green')+xlab("Depressive Disorder for people having 6 to 10 hours average sleep")+ylab("Number of US citizens ")(73771/(73771+350259))## [1] 0.1739759
Based on question 2 and with the result of Question 3, it shows that those having 6 to 10 hours of average sleep have lower depression disorder as perceived by others which accounts to 17.40%. This result is half of the result of those having less than 6 hours of average sleep. This tells us that having 6 to 10 hours of average sleep would lower depression disorder as perceived by others.