Exploring the BRFSS data

Setup

Load packages

library(ggplot2)

## Warning: package 'ggplot2' was built under R version 3.4.3

library(dplyr)

## Warning: package 'dplyr' was built under R version 3.4.3

Load data

load("brfss2013.RData")

Part 1: Data

Health-related risk behaviour data were collected by survery via telephone calls. I believe that these data are generalizable because -

The surveys are conducted in all 50 states, D.C, and three US territories.
The surverys are conducted using the Random Digit Dialing (RDD) technique which indicates that random sampling was employed.

Causality, however, cannot be determined as these data are observational, and no actual experiments were designed or conducted.

Part 2: Research questions

Research quesion 1:

What is the mean alcohol consumption per region?

Research quesion 2:

What is the average level of mental health problems in each region?

Research quesion 3:

Is there a correlation between the levels of alcohol consumption and mental health problems?

Part 3: Exploratory data analysis

Research quesion 1: The first step was to create a new dataframe (called “brfss2013_new”) that contained the variables “avedrnk2” and “menthlth” without the NA values.

brfss2013_new <- brfss2013 %>% filter(!is.na(menthlth), !is.na(avedrnk2))

I then created a new variable called “total_drink” that is the total number of drinks consumed per person per month.To calculate “total_drink” I multplied “avedrnk2” (which is the average number of alcoholic drinks per day in the past 30 days) by 30.

 brfss2013_new <- brfss2013_new %>% mutate(total_drink = avedrnk2*30)

Next I calculated the average alcohol consumed (i.e. average “total_drink”) per state (a new variable called “ave_alc”) and stored these values in the dataframe “state_alc”.

state_alc <- brfss2013_new %>% group_by(X_state) %>% summarise(ave_alc = mean(total_drink))

Finally, I plotted “state_alc” i.e. the average amount of alcohol consumed (“ave_alc”) per state. I also included a horizontal line that corresponds to the average “ave_alc” for all the regions. This way it is visually clear which states have an “above average” level of alcohol consumption. Finally, I adjusted the x-axis labels for improved readability.

ggplot(state_alc, aes(X_state, ave_alc)) + geom_col() + geom_hline(aes(yintercept = mean(ave_alc))) + theme(axis.text.x = element_text(angle = 90, hjust = 1))

This graph indicates that the majority of regions have a below average level of alcohol consumption, indicating that a few regions have alocohol consumption that is significantly above average.

This analysis is important because it identifies regions that show high levels of alcohol consumption, thereby idenfitfying the most “at risk” regions.

Research quesion 2: To address my second question, I performed a similar analysis as for Question 1.

As with the previous analysis, I calculated the average number of days of poor mental health (i.e. average “menthlth”) per state (a new variable called “ave_ment”) and stored these values in the dataframe “state_ment”. The “menthlth” variable corresponds to the number of days a respondent said they experienced poor mental health in the previous 30 days.

state_ment <- brfss2013_new %>% group_by(X_state) %>% summarise(ave_ment = mean(menthlth))

Finally, I plotted “state_ment” i.e. the average number of days of poor mental health (“ave_ment”) per state. I also included a horizontal line that corresponds to the average “ave_ment” for all the regions. This way it is visually clear which states have an “above average” level of mental health problems. Finally, I adjusted the x-axis labels for improved readability.

ggplot(state_ment, aes(X_state, ave_ment)) + geom_col() + geom_hline(aes(yintercept = mean(ave_ment))) + theme(axis.text.x = element_text(angle = 90, hjust = 1))

This graph indicates that there are around the same number of regions that display “above average” and “below average” levels of mental health problems.

This perhaps suggests that mental health policy needs to take into account a larger number of regions, whear alcohol-related health policy needs to be more focused on particularly “high risk” regions.

Research quesion 3: Finally, I wanted to see if there was a correlation between the regions that have high rates of alcohol consumption and high rates of mental health problems.

For this, I made a table that arranged alcohol consumtion (“ave_alc”) in descending order.

state_alc %>% group_by(X_state) %>% arrange(desc(ave_alc))

## # A tibble: 53 x 2
## # Groups:   X_state [53]
##    X_state       ave_alc
##    <fct>           <dbl>
##  1 Puerto Rico     140  
##  2 Guam            111  
##  3 Hawaii           81.2
##  4 Utah             78.6
##  5 West Virginia    75.1
##  6 Mississippi      75.0
##  7 North Dakota     74.6
##  8 Kentucky         73.4
##  9 Wisconsin        73.3
## 10 Texas            72.5
## # ... with 43 more rows

I then did the same for mental health (“ave_ment”).

state_ment %>% group_by(X_state) %>% arrange(desc(ave_ment))

## # A tibble: 53 x 2
## # Groups:   X_state [53]
##    X_state       ave_ment
##    <fct>            <dbl>
##  1 Alabama           4.35
##  2 West Virginia     4.05
##  3 Kentucky          3.74
##  4 Utah              3.71
##  5 Mississippi       3.65
##  6 Louisiana         3.63
##  7 Oklahoma          3.59
##  8 Arkansas          3.53
##  9 Guam              3.47
## 10 Oregon            3.47
## # ... with 43 more rows

Of the ten regions that have the highest levels of alcohol consumption, only five appear in the list of top ten regions of mental health problems. This suggests that these two variables are not correlated, which is surprising. I would have expected more agreement between these two lists.