The Behavioral Risk Factor Surveillance System (BRFSS) is a collaborative project between all of the states in the United States (US) and participating US territories and the Centers for Disease Control and Prevention (CDC). The BRFSS is administered and supported by CDC’s Population Health Surveillance Branch, under the Division of Population Health at the National Center for Chronic Disease Prevention and Health Promotion. BRFSS is an ongoing surveillance system designed to measure behavioral risk factors for the non-institutionalized adult population (18 years of age and older) residing in the US. The BRFSS was initiated in 1984, with 15 states collecting surveillance data on risk behaviors through monthly telephone interviews. Over time, the number of states participating in the survey increased; by 2001, 50 states, the District of Columbia, Puerto Rico, Guam, and the US Virgin Islands were participating in the BRFSS. Today, all 50 states, the District of Columbia, Puerto Rico, and Guam collect data annually and American Samoa, Federated States of Micronesia, and Palau collect survey data over a limited point- in-time (usually one to three months). In this document, the term “state” is used to refer to all areas participating in BRFSS, including the District of Columbia, Guam, and the Commonwealth of Puerto Rico.
The BRFSS objective is to collect uniform, state-specific data on preventive health practices and risk behaviors that are linked to chronic diseases, injuries, and preventable infectious diseases that affect the adult population. Factors assessed by the BRFSS in 2013 include tobacco use, HIV/AIDS knowledge and prevention, exercise, immunization, health status, healthy days — health-related quality of life, health care access, inadequate sleep, hypertension awareness, cholesterol awareness, chronic health conditions, alcohol consumption, fruits and vegetables consumption, arthritis burden, and seatbelt use. Since 2011, BRFSS conducts both landline telephone- and cellular telephone-based surveys. In conducting the BRFSS landline telephone survey, interviewers collect data from a randomly selected adult in a household. In conducting the cellular telephone version of the BRFSS questionnaire, interviewers collect data from an adult who participates by using a cellular telephone and resides in a private residence or college housing.
Health characteristics estimated from the BRFSS pertain to the non-institutionalized adult population, aged 18 years or older, who reside in the US. In 2013, additional question sets were included as optional modules to provide a measure for several childhood health and wellness indicators, including asthma prevalence for people aged 17 years or younger.
library(ggplot2)
library(dplyr)
library(plotly)
library(tidyr)load("brfss2013.RData")Data collection procedures for the brfss2013 dataset are clearly documented in the The BFRSS Data User’s Guide, which can be found at (https://www.cdc.gov/brfss/data_documentation/index.htm). Three components of the process (survey protocol, sample design, and weighting) are explained in detail in this guide to ensure that the BRFSS data are representative of a random sample and most, if not all, calculated statistics are generalizable to a much larger population.
Research question 1:
How does exercise affect a person’s health? The goal here is to determine whether exercise positively or negatively impacts a person’s health.
Research question 2:
Do respondents who classify themselves as ‘healthy’ either smoke cigarettes or drink alchohol? We want to understand the positive or negative effects of smoking and alchohol consumption on a person’s health.
Research question 3:
What are some of the best habits or activities in which a person can engage to improve their overall health? It’s a given that most people want to improve their health but they, more often than not, are either confused or uncertain as to what to do to achieve this goal.
Research question 1: “How does exercise affect a person’s health? The goal here is to determine whether exercise positively or negatively impacts a person’s health.”
Let’s plot a histogram of the general health variable, genhlth. Upon visual inspection, we can see that the general health conditions of the individuals in the data set follow a somewhat ‘normal’ distribution.
fig <- plot_ly(x=brfss2013$genhlth,type="histogram",nbinsx=50) %>%
layout(autosize = F, width = 800, height = 400,
title="Histogram of General Health 'genhlth' Feature",
xaxis=list(title="Health Condition"),yaxis=list(title="Number of Instances"),
legend = list(x = 5, y = 1))
figThe key dataset variables related to exercise we will investigate are the following:
exerany2: Exercise In Past 30 Daysexract11: Type Of Physical Activityexract21: Other Type Of Physical Activity Giving Most Exercise During Past MonthLet’s subset the dataset using genhlth and these exercise features….
exercise <- brfss2013 %>% select(genhlth,exerany2,exract11,exract21)
head(exercise)## genhlth exerany2 exract11
## 1 Fair No <NA>
## 2 Good Yes Walking
## 3 Good No <NA>
## 4 Very good Yes Walking
## 5 Good No <NA>
## 6 Very good Yes Bicycling machine exercise
## exract21
## 1 <NA>
## 2 Household Activities (vacuuming, dusting, home repair, etc.)
## 3 <NA>
## 4 No other activity
## 5 <NA>
## 6 Gardening (spading, weeding, digging, filling)
Let’s group the exercise dataframe by genhlth and determine whether the person has engaged in any exercise in the last 30 days. Also we will remove rows in exercise where genhlth is NA.
exercise_last_30_days <- exercise %>% group_by(genhlth,exerany2) %>%
tally() %>% filter(!is.na(genhlth))
head(exercise_last_30_days)## # A tibble: 6 x 3
## # Groups: genhlth [2]
## genhlth exerany2 n
## <fct> <fct> <int>
## 1 Excellent Yes 67914
## 2 Excellent No 11453
## 3 Excellent <NA> 6115
## 4 Very good Yes 120595
## 5 Very good No 28860
## 6 Very good <NA> 9621
We create a grouped column chart of health conditions and counts of whether individuals exercised in the last 30 days or not.
fig <- plot_ly(data=exercise_last_30_days, x = ~genhlth, y = ~n, color = ~exerany2, type = 'bar') %>%
layout(autosize = F, width = 800, height = 400,
title="Exercised in Last Thirty Days? (Y/N) vs. General Health",
xaxis=list(title="Health Condition"),yaxis=list(title="Count"))
figAs expected, the number of individuals who responded that they are in ‘Fair’ or better health condition and have exercised in the last 30 days outnumber those that have not.
Finally, let’s get an idea of the most popular types of activity that individuals in this dataset do.
exercise_activities <- exercise %>% unite("Activities", exract11,exract21) %>% group_by(genhlth,Activities) %>% tally() %>% filter(!is.na(genhlth)) %>% arrange(genhlth,desc(n))
exercise_top_3_activities <- exercise_activities %>% group_by(genhlth) %>% slice(1:4) %>% filter(Activities!="NA_NA")
exercise_top_3_activities## # A tibble: 15 x 3
## # Groups: genhlth [5]
## genhlth Activities n
## <fct> <chr> <int>
## 1 Excellent Walking_No other activity 10468
## 2 Excellent Walking_Gardening (spading, weeding, digging, filling) 2265
## 3 Excellent Running_Weight lifting 1972
## 4 Very good Walking_No other activity 24183
## 5 Very good Walking_Gardening (spading, weeding, digging, filling) 4955
## 6 Very good Walking_Household Activities (vacuuming, dusting, home repai… 3632
## 7 Good Walking_No other activity 26068
## 8 Good Walking_Gardening (spading, weeding, digging, filling) 3592
## 9 Good Walking_Household Activities (vacuuming, dusting, home repai… 3525
## 10 Fair Walking_No other activity 12012
## 11 Fair Walking_Household Activities (vacuuming, dusting, home repai… 1629
## 12 Fair Walking_Other 1334
## 13 Poor Walking_No other activity 4448
## 14 Poor Walking_Household Activities (vacuuming, dusting, home repai… 571
## 15 Poor Walking_Other 490
We create a grouped column chart of health conditions and counts of whether individuals exercised in the last 30 days or not.
fig <- plot_ly(data=exercise_top_3_activities, x = ~genhlth, y = ~n, color = ~Activities, type = 'bar') %>%
layout(autosize = F, width = 800, height = 400,
title="Top Three Exercise Activities vs. General Health",
xaxis=list(title="Health Condition"),yaxis=list(title="Count"),
legend=list(orientation = "v", x = 1.75, y = 0.8))
figResearch question 2: “Do respondents who classify themselves as ‘healthy’ either smoke cigarettes or drink alchohol? We want to understand the positive or negative effects of smoking and alchohol consumption on a person’s health.”
The features in the dataset related to smoking and alchohol consumption are found in the following columns:
Main Survey - Section 9 - Tobacco Use
smoke100: Smoked At Least 100 CigarettesMain Survey - Section 10 - Alcohol Consumption
alcday5: Days In Past 30 Had Alcoholic Beverageavedrnk2: Avg Alcoholic Drinks Per Day In Past 30Let’s subset the brfss2013 dataset using genhlth and these features….
smoke_drink <- brfss2013 %>% select(genhlth,smoke100,alcday5,avedrnk2)We will use the variable smoke100 to determine if an individual has ever smoked.
smoking_no_smoking <- smoke_drink %>% group_by(genhlth,smoke100) %>% tally() %>%
filter(!is.na(genhlth),!is.na(smoke100))
head(smoking_no_smoking)## # A tibble: 6 x 3
## # Groups: genhlth [3]
## genhlth smoke100 n
## <fct> <fct> <int>
## 1 Excellent Yes 28342
## 2 Excellent No 54392
## 3 Very good Yes 63642
## 4 Very good No 91157
## 5 Good Yes 69765
## 6 Good No 75995
We create a grouped column chart of health conditions and counts of whether individuals have ever smoked or not.
fig <- plot_ly(data=smoking_no_smoking, x = ~genhlth, y = ~n, color = ~smoke100, type = 'bar') %>%
layout(autosize = F, width = 800, height = 400,
title="Has Individual Ever Smoked? (Y/N) vs. General Health",
xaxis=list(title="Health Condition"),yaxis=list(title="Count"))
figWe observe from the bar plot above that the numbers of smokers rises steadily from the ‘poor’ to ‘good’ health conditions and then starts to drop for those in the ‘very good’ and ‘excellent’ categories. A conclusion we can draw is that it appears that the number of smokers is greater that the number of non-smokers in the ‘fair’ and ‘poor’ health categories. As has been proven extensively in more scientific analyses, smoking can be associated with a poorer health condition.
To understand how alchohol consumption affects an individual’s general health, we will create a new variable current_drinker based on the feature avedrnk2. We will assume that, if avedrnk2 is not ’NA" for a particular individual, the individual consumes alchohol.
drinking_no_drinking <- smoke_drink %>% mutate(current_drinker=ifelse(is.na(avedrnk2),"No","Yes")) %>%
group_by(genhlth,current_drinker) %>% tally() %>% filter(!is.na(genhlth))
head(drinking_no_drinking)## # A tibble: 6 x 3
## # Groups: genhlth [3]
## genhlth current_drinker n
## <fct> <chr> <int>
## 1 Excellent No 36413
## 2 Excellent Yes 49069
## 3 Very good No 70147
## 4 Very good Yes 88929
## 5 Good No 84626
## 6 Good Yes 65929
We create a grouped column chart of health conditions and counts of whether individuals are alchohol consumers or not.
fig <- plot_ly(data=drinking_no_drinking, x = ~genhlth, y = ~n, color = ~current_drinker, type = 'bar') %>%
layout(autosize = F, width = 800, height = 400,
title="Currently Consume Alchohol? (Y/N) vs. General Health",
xaxis=list(title="Health Condition"),yaxis=list(title="Count"))
figWe observe from the bar plot above that the numbers of alchohol consumers rises steadily from the ‘poor’ to ‘very good’ health conditions and then drop significantly for those in the ‘excellent’ category. A conclusion we can draw is that the number individuals in the ‘excellent’, ‘fair’ and ‘poor’ health categories tend to drink less alchohol.
Research question 3: “What are some of the best habits or activities in which a person can engage to improve their overall health? It’s a given that most people want to improve their health but they are, more often than not, either confused or uncertain as to what to do to achieve this goal.”
To answer this question, we will investigate how one of the provided calculated variables, X_pacat1 (Physical Activity Categories), correlates with an individual’s general health condition.
First, let’s subset the brfss2013 dataset using genhlth and X_pacat1….
physical_activity <- brfss2013 %>% select(genhlth,X_pacat1)
head(physical_activity)## genhlth X_pacat1
## 1 Fair Inactive
## 2 Good Insufficiently active
## 3 Good Inactive
## 4 Very good Insufficiently active
## 5 Good Inactive
## 6 Very good Insufficiently active
activity_summary <- physical_activity %>% group_by(genhlth,X_pacat1) %>% tally() %>%
filter(!is.na(genhlth),!is.na(X_pacat1))
head(activity_summary)## # A tibble: 6 x 3
## # Groups: genhlth [2]
## genhlth X_pacat1 n
## <fct> <fct> <int>
## 1 Excellent Highly active 31183
## 2 Excellent Active 14881
## 3 Excellent Insufficiently active 12757
## 4 Excellent Inactive 12605
## 5 Very good Highly active 49750
## 6 Very good Active 27230
Let’s generate a bar chart of Physical Activity category counts vs. General Health
fig <- plot_ly(data=activity_summary, x = ~genhlth, y = ~n, color = ~X_pacat1, type = 'bar') %>%
layout(autosize = F, width = 800, height = 400,
title="Physical Activity vs. General Health",
xaxis=list(title="Health Condition"),yaxis=list(title="Count"),
legend=list(orientation = "v", x = 0.65, y = 0.9))
figIt is clear from the plot above that physical activity positively correlates with a person’s health condition. People in better health definitely tend to be more active.