Setup

Load packages

library(ggplot2)
library(dplyr)

Load data

load("brfss2013.RData")

Part 1: Data

About the Data

The Behavioral Risk Factor Surveillance System (BRFSS) is a collaborative project between all 50 of the states in the United States (US) that conducts telephone surveys to collect state-specific data on preventive health practices and risk behaviors linked to adult health problems. For example, respondents are asked about their General Health, Health Care Access, Inadequate Sleep, Cholesterol Awareness, Chronic Health Conditions, Demographics, Tobacco Use, Alcohol Consumption, Fruits and Vegetables Usage, Exercise (Physical Activity), Arthritis Burden, Seatbelt Use, HIV/AIDS status, Diabetes or not, Sugar Drinks….

Data collection

The data collection proccess is explained in brfss_codebook. BRFSS conducts both landline telephone- and cellular telephone-based surveys. With landline telephone survey, interviewers collect data from a randomly selected adult in a household residing in the US. Using stratified sampling strategy, the population is divided into groups (States) called strata. Then a random sampling (telephone), has been employed within each stratum.

Genaralizability

Because the data is collected based a large stratified random sample (491,775 U.S. adults aged 18 years or older), we can conclude that the survey is genaralizable to all of the US population.

Causality

Since BRFSS is an observational study (gathering survey data), there should be no causations.


Part 2: Research questions

Research quesion 1: Exploring the relationship between how often US adults drink regular soda or pop, their fruit intake versus if they are overweight or obese.

Research quesion 2: Exploring the relationship between how often a US adult practice their muscle versus whether he/she is obese or overweight.

Research quesion 3: Exploring if Computed Weight In Kilograms follows normal distribution.


Part 3: Exploratory data analysis

Research quesion 1:

# During the past 30 days, how often did you drink regular soda or pop that contains sugar?
brfss2013 %>% 
  filter(!is.na(ssbsugar)) %>%
  group_by(ssbsugar) %>% 
  summarise(count = n())
## # A tibble: 93 x 2
##    ssbsugar count
##       <int> <int>
##  1        0 46115
##  2      101  8317
##  3      102  3647
##  4      103  1361
##  5      104   599
##  6      105   324
##  7      106   192
##  8      107    72
##  9      108    58
## 10      109     3
## # ... with 83 more rows
# Fruit intake in times per day
brfss2013 %>%
  filter(!is.na(frutda1_)) %>%
  group_by(frutda1_) %>% 
  summarise(count = n())
## # A tibble: 128 x 2
##    frutda1_ count
##       <int> <int>
##  1        0 17896
##  2        2  1143
##  3        3  5588
##  4        7  9214
##  5       10  8015
##  6       13  6333
##  7       14 13351
##  8       17  9811
##  9       20  3596
## 10       23  2610
## # ... with 118 more rows
# Adults who have a body mass index greater than 25.00 (Overweight or Obese)
brfss2013 %>%
  filter(!is.na(X_rfbmi5)) %>%
  group_by(X_rfbmi5) %>% 
  summarise(count = n())
## # A tibble: 2 x 2
##   X_rfbmi5  count
##   <fct>     <int>
## 1 No       163161
## 2 Yes      301885
# Adults who is not obese or overweght, see the relationship between 3 variables
brfss2013 %>% 
group_by(ssbsugar, frutda1_, X_rfbmi5) %>%
  summarise(count = n()) %>%
  filter(X_rfbmi5 == 'No')
## # A tibble: 1,193 x 4
## # Groups:   ssbsugar, frutda1_ [1,193]
##    ssbsugar frutda1_ X_rfbmi5 count
##       <int>    <int> <fct>    <int>
##  1        0        0 No         533
##  2        0        2 No          12
##  3        0        3 No         134
##  4        0        7 No         195
##  5        0       10 No         151
##  6        0       13 No         165
##  7        0       14 No         308
##  8        0       17 No         200
##  9        0       20 No          77
## 10        0       23 No          55
## # ... with 1,183 more rows
# Adults who is obese or overweght, see the relationship between 3 variables
brfss2013 %>% 
group_by(ssbsugar, frutda1_, X_rfbmi5) %>%
  summarise(count = n()) %>%
  filter(X_rfbmi5 == 'Yes')
## # A tibble: 1,532 x 4
## # Groups:   ssbsugar, frutda1_ [1,532]
##    ssbsugar frutda1_ X_rfbmi5 count
##       <int>    <int> <fct>    <int>
##  1        0        0 Yes       1034
##  2        0        2 Yes         27
##  3        0        3 Yes        274
##  4        0        7 Yes        471
##  5        0       10 Yes        409
##  6        0       13 Yes        338
##  7        0       14 Yes        670
##  8        0       17 Yes        526
##  9        0       20 Yes        209
## 10        0       23 Yes        121
## # ... with 1,522 more rows
plot(brfss2013$ssbsugar, brfss2013$X_rfbmi5, xlab = 'soda or pop drink', ylab = 'obese')

Conclusion: we do not see any clear relationships between how often an US adult drink regular soda or pop, their fruit intake and whether they are overweight or obese.

brfss2013_clean <- brfss2013 %>% 
  filter(X_rfbmi5 != "NA")

ggplot(data=brfss2013_clean, aes(x=X_rfbmi5, y=ssbsugar)) + geom_bar(stat="identity")
## Warning: Removed 366639 rows containing missing values (position_stack).

With the above figure, we see that people who drink pop or soda are more likely to be overweight.

Research quesion 2:

# We calculate the mean, median and sd of how many times per week or per month did a US person do physical activities or exercises to STRENGTHEN his muscles
brfss2013 %>% 
  filter(!(is.na(strength))) %>%
  summarise(strength_mean = mean(strength), strength_median = median(strength), strength_sd = sd(strength), 
            strength_min = min(strength), strength_max = max(strength))
##   strength_mean strength_median strength_sd strength_min strength_max
## 1      51.50212               0    74.86458            0         9072
brfss2013 %>%
group_by(strength, X_rfbmi5) %>%
  summarise(count = n()) %>%
  filter(X_rfbmi5 == 'No')
## # A tibble: 91 x 3
## # Groups:   strength [91]
##    strength X_rfbmi5 count
##       <int> <fct>    <int>
##  1        0 No       82509
##  2        2 No           1
##  3      101 No        6854
##  4      102 No       10628
##  5      103 No       12734
##  6      104 No        4671
##  7      105 No        4688
##  8      106 No        1541
##  9      107 No        5297
## 10      108 No          35
## # ... with 81 more rows
brfss2013 %>%
group_by(strength, X_rfbmi5) %>%
  summarise(count = n()) %>%
  filter(X_rfbmi5 == 'Yes')
## # A tibble: 101 x 3
## # Groups:   strength [101]
##    strength X_rfbmi5  count
##       <int> <fct>     <int>
##  1        0 Yes      186155
##  2      101 Yes        9125
##  3      102 Yes       14402
##  4      103 Yes       17166
##  5      104 Yes        5814
##  6      105 Yes        6074
##  7      106 Yes        1890
##  8      107 Yes        6844
##  9      108 Yes          38
## 10      109 Yes           7
## # ... with 91 more rows
plot(brfss2013$X_rfbmi5, brfss2013$strength)