library(ggplot2)## Warning: package 'ggplot2' was built under R version 3.4.3
library(dplyr)## Warning: package 'dplyr' was built under R version 3.4.3
load("brfss2013.RData")Health-related risk behaviour data were collected by survery via telephone calls. I believe that these data are generalizable because -
The surveys are conducted in all 50 states, D.C, and three US territories.
The surverys are conducted using the Random Digit Dialing (RDD) technique which indicates that random sampling was employed.
Causality, however, cannot be determined as these data are observational, and no actual experiments were designed or conducted.
Research quesion 1:
What is the mean alcohol consumption per region?
Research quesion 2:
What is the average level of mental health problems in each region?
Research quesion 3:
Is there a correlation between the levels of alcohol consumption and mental health problems?
Research quesion 1: The first step was to create a new dataframe (called “brfss2013_new”) that contained the variables “avedrnk2” and “menthlth” without the NA values.
brfss2013_new <- brfss2013 %>% filter(!is.na(menthlth), !is.na(avedrnk2))I then created a new variable called “total_drink” that is the total number of drinks consumed per person per month.To calculate “total_drink” I multplied “avedrnk2” (which is the average number of alcoholic drinks per day in the past 30 days) by 30.
brfss2013_new <- brfss2013_new %>% mutate(total_drink = avedrnk2*30)Next I calculated the average alcohol consumed (i.e. average “total_drink”) per state (a new variable called “ave_alc”) and stored these values in the dataframe “state_alc”.
state_alc <- brfss2013_new %>% group_by(X_state) %>% summarise(ave_alc = mean(total_drink))Finally, I plotted “state_alc” i.e. the average amount of alcohol consumed (“ave_alc”) per state. I also included a horizontal line that corresponds to the average “ave_alc” for all the regions. This way it is visually clear which states have an “above average” level of alcohol consumption. Finally, I adjusted the x-axis labels for improved readability.
ggplot(state_alc, aes(X_state, ave_alc)) + geom_col() + geom_hline(aes(yintercept = mean(ave_alc))) + theme(axis.text.x = element_text(angle = 90, hjust = 1))This graph indicates that the majority of regions have a below average level of alcohol consumption, indicating that a few regions have alocohol consumption that is significantly above average.
This analysis is important because it identifies regions that show high levels of alcohol consumption, thereby idenfitfying the most “at risk” regions.
Research quesion 2: To address my second question, I performed a similar analysis as for Question 1.
As with the previous analysis, I calculated the average number of days of poor mental health (i.e. average “menthlth”) per state (a new variable called “ave_ment”) and stored these values in the dataframe “state_ment”. The “menthlth” variable corresponds to the number of days a respondent said they experienced poor mental health in the previous 30 days.
state_ment <- brfss2013_new %>% group_by(X_state) %>% summarise(ave_ment = mean(menthlth))Finally, I plotted “state_ment” i.e. the average number of days of poor mental health (“ave_ment”) per state. I also included a horizontal line that corresponds to the average “ave_ment” for all the regions. This way it is visually clear which states have an “above average” level of mental health problems. Finally, I adjusted the x-axis labels for improved readability.
ggplot(state_ment, aes(X_state, ave_ment)) + geom_col() + geom_hline(aes(yintercept = mean(ave_ment))) + theme(axis.text.x = element_text(angle = 90, hjust = 1))This graph indicates that there are around the same number of regions that display “above average” and “below average” levels of mental health problems.
This perhaps suggests that mental health policy needs to take into account a larger number of regions, whear alcohol-related health policy needs to be more focused on particularly “high risk” regions.
Research quesion 3: Finally, I wanted to see if there was a correlation between the regions that have high rates of alcohol consumption and high rates of mental health problems.
For this, I made a table that arranged alcohol consumtion (“ave_alc”) in descending order.
state_alc %>% group_by(X_state) %>% arrange(desc(ave_alc))## # A tibble: 53 x 2
## # Groups: X_state [53]
## X_state ave_alc
## <fct> <dbl>
## 1 Puerto Rico 140
## 2 Guam 111
## 3 Hawaii 81.2
## 4 Utah 78.6
## 5 West Virginia 75.1
## 6 Mississippi 75.0
## 7 North Dakota 74.6
## 8 Kentucky 73.4
## 9 Wisconsin 73.3
## 10 Texas 72.5
## # ... with 43 more rows
I then did the same for mental health (“ave_ment”).
state_ment %>% group_by(X_state) %>% arrange(desc(ave_ment))## # A tibble: 53 x 2
## # Groups: X_state [53]
## X_state ave_ment
## <fct> <dbl>
## 1 Alabama 4.35
## 2 West Virginia 4.05
## 3 Kentucky 3.74
## 4 Utah 3.71
## 5 Mississippi 3.65
## 6 Louisiana 3.63
## 7 Oklahoma 3.59
## 8 Arkansas 3.53
## 9 Guam 3.47
## 10 Oregon 3.47
## # ... with 43 more rows
Of the ten regions that have the highest levels of alcohol consumption, only five appear in the list of top ten regions of mental health problems. This suggests that these two variables are not correlated, which is surprising. I would have expected more agreement between these two lists.