Research quesion 1: What is the relationship between perceived physical health and mental health? To answer this question, I will look at the following variables:
genhlth : General Health rated Poor –> Excellent
menthlth : Number of Days (out of 31) Mental Health Not Good
Research quesion 2: Is number of not-good mental health days different by gender? Is it different by state? To answer this question, I will look at the following variables:
sex : Respondent’s sex
X_state : Respondent’s state where they were surveyed
menthlth : Number of Days (out of 31) Mental Health Not Good
Research quesion 3: Finally, I will explore the relationship between bad mental health days, gender and reported general health? Are females more likely to report good general health despite having more bad mental health days?
NOTE: Insert code chunks as needed by clicking on the “Insert a new code chunk” button (green button with orange arrow) above. Make sure that your code is visible in the project you submit. Delete this note when before you submit your work.
Research quesion 1:
# subset dataframe to only the columns of interest
q1 <- select(brfss2013, genhlth, menthlth) %>% na.omit()# bucket number of mental health days into groups - low, medium, high
q1$menthlth_group<-ifelse(q1$menthlth < 5, "low",
ifelse(q1$menthlth>=5 & q1$menthlth<10,"medium","high"))
prop.table(table(q1$genhlth, q1$menthlth_group),2)##
## high low medium
## Excellent 0.06467717 0.19821348 0.11355926
## Very good 0.18121018 0.35110083 0.31544238
## Good 0.29034201 0.30806971 0.33469294
## Fair 0.27195791 0.10936918 0.17676209
## Poor 0.19181274 0.03324680 0.05954332
Looking at the frequency of responses to ratings of a participants’ general health stratified by low, medium and high number of “mental health not good” days, it seems as though those with a low number of bad mental health days rate themselves as having excellent general health most frequently of the 3 groups, and rate themselves as having poor general health the least frequently.
The opposite is true for participants in the high bad mental health days group.
g1 <- ggplot(q1) + aes(x=menthlth_group, fill=genhlth) + geom_bar(position="fill")
g1 <- g1 + xlab("# of Bad Mental Health Days") + ylab("Proportion") + scale_fill_discrete(name="Reported General Health")
g1
You can see in the stacked bar chart above that particpants who report the most frequent bad mental health days also report most frequently report poor general health.
Conversly, participants who reported the least bad mental health days reported the best overall general health.
It seems as though reported mental health is correlated with reported general health.
Research quesion 2:
# subset data to male & female to get two dataframes
male_q2 <- brfss2013 %>%
filter(sex == "Male")
female_q2 <- brfss2013 %>%
filter(sex == "Female")
male_q2 <- select(male_q2, menthlth) %>% na.omit()
female_q2 <- select(female_q2, menthlth) %>% na.omit()
dim(male_q2)## [1] 198066 1
## [1] 285079 1
# plot male # of not good mental health days
male_q2_distribution = ggplot(data = male_q2, aes(x = menthlth)) +
geom_histogram(binwidth = 5) + ggtitle("Male Frequency of Bad Mental Health Days in a Month")
male_q2_distribution# plot female # of not good mental health days
female_q2_distribution = ggplot(data = female_q2, aes(x = menthlth)) +
geom_histogram(binwidth = 5) + ggtitle("Female Frequency of Bad Mental Health Days in a Month")
female_q2_distribution
The distributions look pretty similar, but I also want to compare the average # of bad mental health days in a month for each group:
q2 <- select(brfss2013, sex, menthlth) %>% na.omit()
q2 %>%
group_by(sex) %>%
summarise(mean_days = mean(menthlth),
median_days = median(menthlth),
sd_days = sd(menthlth),n = n())## # A tibble: 2 x 5
## sex mean_days median_days sd_days n
## <fct> <dbl> <dbl> <dbl> <int>
## 1 Male 2.78 0 7.13 198066
## 2 Female 3.78 0 8.05 285079
Though the distributions between male & female look fairly similar, the female subset is more heavily skewed right than male, so the female mean number of days is higher. However, median is 0 for both groups, because the median is a more robust statistic, meaning it is less affected by skewedness.
# create map of US states
all_states <- map_data("state")
# select the relevant columns from brfss dataframe
q2_state <- select(brfss2013, X_state, menthlth) %>% na.omit()
# calculate the mean number of mental health days by state
# using the state mean, calculate the mean % of bad mental health days
q2_state_menthlth <- q2_state %>%
filter(menthlth<=30) %>%
group_by(X_state) %>%
summarise(mean_days = mean(menthlth),n = n()) %>%
mutate(pct_bad_menthlth_days = (mean_days/30)*100)
# create the mental health map and merge with the US states map
menthlth_states_map <- q2_state_menthlth %>%
mutate(region= tolower(X_state))
states_map <- merge(all_states, menthlth_states_map, by="region")
# plot the heatmap
ggplot() +
geom_polygon(data = states_map, aes(x= long, y= lat, group = group, fill = pct_bad_menthlth_days), color = "white") +
ggtitle("Heat Map of % Bad Mental Health Days in the US") +
scale_fill_gradient2(low = "white", high = "darkred") +
theme(legend.position = c(1,0), legend.justification = c(1,0))
States with the highest average percent of bad mental health days appear to be:
- Alabama
- Kentucky
- West Virginia
# print a table of the top 10 states with highest percent bad mental health days
head(arrange(q2_state_menthlth, desc(pct_bad_menthlth_days)), n=10)## # A tibble: 10 x 4
## X_state mean_days n pct_bad_menthlth_days
## <fct> <dbl> <int> <dbl>
## 1 Alabama 4.44 6334 14.8
## 2 West Virginia 4.40 5798 14.7
## 3 Kentucky 4.31 10717 14.4
## 4 Oklahoma 4.01 8114 13.4
## 5 Arkansas 3.92 5139 13.1
## 6 Puerto Rico 3.91 5945 13.0
## 7 Mississippi 3.88 7298 12.9
## 8 Oregon 3.85 5861 12.8
## 9 Tennessee 3.85 5703 12.8
## 10 California 3.76 11410 12.5
This table shows the states with the highest average percent of bad mental health days in a month.
Research quesion 3: Are females more likely to report good general health despite having more bad mental health days?
# subset the dataframe to metrics of interest, remove NA's, remove mental health outliers
q3 <- select(brfss2013, sex, menthlth, genhlth)
q3_cleaned <- select(q3, sex, menthlth, genhlth) %>% na.omit() %>% filter(menthlth <= 30)
dim(q3_cleaned)## [1] 481487 3
# assign low, medium & high mental health days categories to
# each individual as we did in question 1
q3_cleaned$menthlth_group<-ifelse(q3_cleaned$menthlth < 5, "low # days",
ifelse(q3_cleaned$menthlth>=5 & q3_cleaned$menthlth<10,"medium # days","high # days"))# plot the proportion of participants in each mental health group (low, med, high days)
# who reported each general health rating and segment by gender
g3 <- ggplot(q3_cleaned) + aes(x=sex, fill=genhlth) + geom_bar(position = "fill") + facet_grid(.~menthlth_group)
g3 <- g3 + xlab("Mental Health Category by Gender") + ylab("Proportion") + scale_fill_discrete(name="Reported General Health")
g3
Though reported general health does vary by group (low, medium and high # of bad mental health days), it does not seem to vary much by gender. It seems that of participants who reported a high number of bad mental health days, participants were slightly less likely to report poor general health if they were female, but this difference may be due to sampling error & would require statistical testing to determine if this effect of gender is real.