library(ggplot2)
library(dplyr)
library(reshape2)
load("brfss2013.RData")
The Behavioral Risk Factor Surveillance (BRFSS) is a system committed to collect state data about U.S. residents regarding their health-related risk behaviors, chronic health conditions, and use of preventive services. The BBFSS objective is to collect uniform data by means of telephone and cellular surveys. The dataset brfss2013 consists of:
The sample collection metodology: In conducting the BRFSS landline telephone survey, interviewers collect data from a randomly selected adult in a household. In conducting the cellular telephone version of the BRFSS questionnaire, interviewers collect data from an adult who participates by using a cellular telephone and resides in a private residence or college housing.
Data collection method implications: Since the sample was obtained from either a randomly selected adult in a household through landline telephone or a randomly cellular telephone user who resides in a private residence or college housing, we cannot reneralized the results to the entire U.S. population. The selected population was divided into homogeneous strata and then randomly sample from within each stratum; in other words, U.S. residents with no land-line telephone or cellular telophone are not taken into account.
Scope of Inference: Each subject in the stratum is equally likely to be selected, therefore the sample is representative of the population from which it comes (land-line telephone or cellular telophone users). As the groups are not escencially the same (due there is no random assignment), causal conclutions cannot be made.
In short, we have an observational study: not-causal-generalizable.
Research quesion 1:: First, we could make inquiries about any possible relationship between sleep habits and poor quality life. The variables could be negatively correlated, in other words, if the sleeping hours increases, the number of bad physical/mental days decreases. We will take the following variables:
physhlth: Number Of Days Physical Health Not Good.menthlth, Number Of Days Mental Health Not Good.sleptim1, How Much Time Do You Sleep.Do the sleep hours have to do with the number of days of bad physical/mental health?
Research quesion 2:: Second, we could be interested in the general health status based on number of adults in household and gender of the respondents. Health status could change depending on the number of adults in household and whether the respondent is male or female. More than tow adults in household could reduce de perpective of good health. We will consider the following variables:
genhlth: General Health.sex: Respondents Sex.numadult: Number Of Adults In HouseholdDoes the general health perspective change depending on the number of adults in household and gender of the person?
Research quesion 3:: Third, we may wonder if physical activities have any impact on anxiety and the type of physical activity which may reduce or increase the average number of anxious days. We will consider the following variables:
exract11: Type Of Physical Activityexerhmm1: Minutes Or Hours Walking, Running, Jogging, Or Swimmingqlstres2: How Many Days Felt Anxious In Past 30 DaysDo physical activities reduce the average of anxious days? And if the question is positive, which sort of actvity reducues anxiety the most?
Research quesion 1: Do the sleep hours have to do with the number of days of bad physical/mental health?
# Let's select the variables we will work with and then summarise them.
sleep <- brfss2013 %>% select(sleptim1,physhlth,menthlth)
summary(sleep)
## sleptim1 physhlth menthlth
## Min. : 0.000 Min. : 0.000 Min. : 0.000
## 1st Qu.: 6.000 1st Qu.: 0.000 1st Qu.: 0.000
## Median : 7.000 Median : 0.000 Median : 0.000
## Mean : 7.052 Mean : 4.353 Mean : 3.383
## 3rd Qu.: 8.000 3rd Qu.: 3.000 3rd Qu.: 2.000
## Max. :450.000 Max. :60.000 Max. :5000.000
## NA's :7387 NA's :10957 NA's :8627
Before we could make any plot, we need to clean up the data. There are possible outliers within the variables. Sleep time cannot be more than 24 hours per day and days of mental/physic health more than 30 days per month.
# Removing possible outliers
sleep <- sleep %>% filter(sleep$sleptim1<=24 & sleep$menthlth<=30 & sleep$physhlth<=30)
#Group the data by sleeping hours and calculate the mean of days of physic/mental health
sleep <- sleep %>% group_by(sleptim1) %>%
summarise(physic=mean(physhlth,na.rm=TRUE),mental=mean(menthlth, na.rm=TRUE))
# Stack physhlth and physhlth into a single column
sleep <- sleep %>% melt(id.vars=c("sleptim1"),variable.name="TYPE",value.name="DAYS")
# Create a scatterplot. We remove 23 hours from the plot due to its unusual value of 30 average days.
ggplot(data=sleep[-which(sleep$DAYS==30),], aes(x=sleptim1, y=DAYS, color=TYPE)) + geom_point(size=2) +
labs(title="Sleeping Hours - Average Days Health Not Good",
y="Average Days Health Not Good",x="Sleeping Hours")
Comments:
Sleeping Hours and Average Days Health Not Good is now identified, this is not guarantee that the relationship between the two variables is causal.Research quesion 2:: Does the general health perspective change depending on the number of adults in household and gender of the person?
# Let's select the variables we will work with
household <- brfss2013 %>% select(genhlth,sex,numadult)
# Removing the NA values from each variable
household <- household %>% filter(complete.cases(household))
Now we are working with categorical variables. Therefore it is necessary to calculate the frequency of the anwers qualified as poor, fair, good, very good and excellent. Also, in order to get better visualization, we will filter the number of adults to less than 5 adults in household.
# Create the frequency table
household <- as.data.frame(table(household))
# Filter for only representative values
household <- household %>% filter(as.integer(numadult)<5)
# Create a barplot
ggplot(household, aes(fill=sex,y=Freq/1e3,x=genhlth)) + geom_bar(position="stack", stat="identity") +
ggtitle("General Health Acoording to Adults in Household") + ylab("Frequency of responses (K)") + xlab("General Health") +
facet_wrap( ~numadult , ncol=2) +
theme(plot.title = element_text(hjust = 0.5)) +
theme(axis.text.x = element_text(angle=30, vjust=1, hjust=1))
Comments:
Research quesion 3:: Do physical activities reduce the average of anxious days? And if the question is positive, which sort of actvity reducues anxiety the most?
# Let's select the variables we will work with
anxiety <- brfss2013 %>% select(exerhmm1,qlstres2,exract11)
anxiety <- anxiety %>% filter(complete.cases(anxiety))
names(anxiety) <- c("exercise","anxiety","type")
# Change the observations to characters to crete new factor levels
anxiety$type <- as.character(anxiety$type)
# New factor levels
strength <- c("machine|weight")
cardio <- c("dance|aerobics|bicycling|dancing-ballet|jogging|running|walking|hiking")
sports <- c("basketball|calisthenics|softball")
focus <- c("carpentry|fishing|gardening|golf|household|hunting|yard|yoga|horseback")
# Search matches
anxiety$type <- case_when(grepl(strength, tolower(anxiety$type)) ~ "strength",
grepl(sports, tolower(anxiety$type)) ~ "sports",
grepl(focus, tolower(anxiety$type)) ~ "focus",
grepl(cardio, tolower(anxiety$type)) ~ "cardio",
tolower(anxiety$type)=="other" ~ "other")
In order to facilitate the analysis we will group the variable exract11 into 5 categories, instead of the 26 unique values that exist. The activity types are actually very various in some cases, but we can assume the following factors:
strenght: Activities that are highly demanding for muscles.cardio: Activities that require great physical condition.sports: Activities that are both highly demanding for muscles and physical condition.focus: Activities that requires deep concentration.other: Activities not specified in the original dataset.# Create a scatterplot
ggplot(data=anxiety, aes(x=exercise, y=anxiety, color=type)) + geom_point(size=2) +
labs(title="Minutes of Physical Activity - Anxious Days",
y="Days Felt Anxious In Past 30 Days",x="Minutes of Physical Activity") + ylim(0,35)
## Warning: Removed 1 rows containing missing values (geom_point).
Comments:
Minutes of Physical Activity and Days Felt Anxious In Past 30 Days. This result may be due to great variability in the type of activities and the size of the sample, but it allows us to ask other interesting questions, like: it is the people who practice cardio the most anxious while the people who practice concentration activities the less?