Exploring the BRFSS data

Introduction

The data from this project come from the “Behavioral Risk Factor Surveillance System” (BRFSS). You can learn more about this survey on the following website: https://www.cdc.gov/brfss/.

The BRFSS is a national survey that collects health-related data by telephone about U.S. residents (adults +18 years old) regarding their health-related risk behaviors, chronic health conditions, and use of preventive services. It was established in 1984 with 15 states, but now BRFSS collects data in all 50 states as well as the District of Columbia and three U.S. territories. The survey completes more than 400,000 adult interviews each year, making it the largest continuously conducted health survey system in the world.

Note on causality: Since this is an observational cross-sectional survey, we cannot establish causal inference through the data. We may draw conclusions about prevalence, correlation, and even association. However, we are not able to distinguish the direction of this association, in other words, the causality. We cannot assume that one outcome causes the other, instead of the other outcome causing the first one.

Load packages

library(ggplot2)
library(dplyr)
library(magrittr)
library(scales)
library(RColorBrewer)
library(tidyverse)
library(pander)
library(statsr)
library(devtools)

Load data

load("brfss2013.RData")

Is there a relation between the number of hours slept and the participant’s body mass index (BMI)?

Background: Some studies in the field of chrononutrition have raised a possible relationship between individuals’ sleep time and their weight status. In this question, I want to explore this possible relation under the assumption (hypothesis) that people who sleep less have a higher BMI.

The dataset contains 458,915 observations after NAs and refusal removal among the three categories.

Interested variables:

  • sleptime1 - Discrete variable: Range from 1-24. (“On average, how many hours of sleep do you get in a 24-hour period?”)
  • weight2: Categorical variable: 1 - Underweight; 2 - Normal Weight; 3 - Overweight; 4 - Obese
  • sex: Binary outcome: 1 - Male; 2 - Female
##     sleptim1              X_bmi5cat          sex        
##  Min.   : 1.000   Underweight  :  8054   Male  :195085  
##  1st Qu.: 6.000   Normal weight:152911   Female:263830  
##  Median : 7.000   Overweight   :165107                  
##  Mean   : 7.049   Obese        :132843                  
##  3rd Qu.: 8.000                                         
##  Max.   :24.000

From the summary output we can see the absolute frequency for each level on the categorical variables (sex and BMI categories). The summary statistics for sleep time - a discrete variable - show a range of 1-24 hours of sleep with a mean (7.049 hours) close to the median (7 hours).

#Descriptive Statistics

table1 <- table(sleep$sex, sleep$X_bmi5cat)
prop_table <- prop.table(table1, 1)
prop_table_percent <- prop_table * 100
pander(prop_table_percent, caption = "Contingency Table of BMI status by sex (%)")
Contingency Table of BMI status by sex (%)
  Underweight Normal weight Overweight Obese
Male 0.9565 26.89 43.03 29.13
Female 2.345 38.08 30.76 28.81
result <- aggregate(x = sleep$sleptim1,
                    by = list(sleep$X_bmi5cat, sleep$sex),
                    FUN = mean)
pander(result, caption = "Mean Sleep Time by BMI Category and Sex")
Mean Sleep Time by BMI Category and Sex
Group.1 Group.2 x
Underweight Male 7.034
Normal weight Male 7.105
Overweight Male 7.041
Obese Male 6.945
Underweight Female 7.098
Normal weight Female 7.126
Overweight Female 7.078
Obese Female 6.959

The descriptive statistics show that men are mostly overweight and women mostly normal-weight. The obesity rate is close to 30% for both sexes. Very few individuals were underweight.

The mean sleep time between sexes was very similar: 7.03h for men and 7.06h for women.

When considering weight status, the highest mean sleep (7.12h) time was among normal-weight women, while the lowest mean sleep time was among obese men (6.94h).

# Plot the bar graph with percentage labels
prop_table <- sleep %>%
  group_by(X_bmi5cat, sex) %>%
  summarise(count = n()) %>%
  mutate(prop = count / sum(count) * 100)
## `summarise()` has grouped output by 'X_bmi5cat'. You can override using the
## `.groups` argument.
ggplot(prop_table, aes(x = X_bmi5cat, y = prop, fill = sex)) +
  geom_bar(stat = "identity", position = "stack") +
  geom_text(aes(label = paste0(round(prop), "%")), 
            position = position_stack(vjust = 0.5),
            color = "white", fontface = "bold", hjust = 0.5) +
  labs(x = "BMI Categories", y = "Percentage", fill = "Sex") +
scale_fill_brewer(palette="Set2") +
  theme_classic()

#Plotting graphs

ggplot(sleep, aes(x=X_bmi5cat, y=sleptim1, fill=sex)) + 
    geom_boxplot() + labs(y = "Hours of Sleep", x = "BMI categories") +
   theme_bw() + scale_fill_brewer(palette="Set2")

When we investigated sleep time by sex among nutritional status categories the plots showed no relevant differences in the distributions. However, there are several outliers in the categories.

Inference

  • Independence t-test: To compare the mean age between sexes.
  • Chi-square independence test: To compare the association between BMI categories and sex.
  • ANOVA: To compare mean sleeping time between BMI categories.
ttest <- t.test(sleep$sleptim1 ~ sleep$sex)
pander(ttest, caption = "Independent t-test - Sleeping time by sex")
Independent t-test - Sleeping time by sex (continued below)
Test statistic df P value Alternative hypothesis
-7.348 425925 2.009e-13 * * * two.sided
mean in group Male mean in group Female
7.03 7.062
chi_square <- chisq.test(sleep$X_bmi5cat, sleep$sex, correct = FALSE)
pander(chi_square, caption = "Chi-Square Test Results")
Chi-Square Test Results
Test statistic df P value
10141 3 0 * * *
oneway <- aov(sleptim1 ~ X_bmi5cat, data = sleep)
summary(oneway)
##                 Df Sum Sq Mean Sq F value Pr(>F)    
## X_bmi5cat        3   1988   662.8   310.4 <2e-16 ***
## Residuals   458911 979913     2.1                   
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
pander(oneway, caption = "Analysis of variance - Sleeping time by BMI categories")
Analysis of variance - Sleeping time by BMI categories
  Df Sum Sq Mean Sq F value Pr(>F)
X_bmi5cat 3 1988 662.8 310.4 2.394e-201
Residuals 458911 979913 2.135 NA NA
TukeyHSD(oneway)
##   Tukey multiple comparisons of means
##     95% family-wise confidence level
## 
## Fit: aov(formula = sleptim1 ~ X_bmi5cat, data = sleep)
## 
## $X_bmi5cat
##                                  diff          lwr         upr     p adj
## Normal weight-Underweight  0.03520084 -0.007717174  0.07811885 0.1507698
## Overweight-Underweight    -0.02454772 -0.067386343  0.01829091 0.4544401
## Obese-Underweight         -0.13031921 -0.173399119 -0.08723930 0.0000000
## Overweight-Normal weight  -0.05974855 -0.073072192 -0.04642492 0.0000000
## Obese-Normal weight       -0.16552005 -0.179600173 -0.15143992 0.0000000
## Obese-Overweight          -0.10577149 -0.119607752 -0.09193524 0.0000000
tukey <- TukeyHSD(oneway)
pander(tukey, caption = "Analysis of variance with Tukey multiple pairwise-comparisons - Sleeping time by BMI categories")
## Warning in pander.default(tukey, caption = "Analysis of variance with
## Tukey multiple pairwise-comparisons - Sleeping time by BMI categories"):
## No pander.method for "TukeyHSD", reverting to default.No pander.method for
## "multicomp", reverting to default.
  • X_bmi5cat:

      diff lwr upr p adj
    Normal weight-Underweight 0.0352 -0.007717 0.07812 0.1508
    Overweight-Underweight -0.02455 -0.06739 0.01829 0.4544
    Obese-Underweight -0.1303 -0.1734 -0.08724 7.006e-14
    Overweight-Normal weight -0.05975 -0.07307 -0.04642 0
    Obese-Normal weight -0.1655 -0.1796 -0.1514 0
    Obese-Overweight -0.1058 -0.1196 -0.09194 0

Conclusions:

  • Because the p-value of the independent t-test (p<0.001) is less than alpha = 0.05, we reject the null hypothesis of the test. This means we have sufficient evidence to say that the mean sleeping time of US individuals is different between the sexes.

  • Since we get a p-Value (p<0.001) in the chi-square test, which is less than the significance level of 0.05, we reject the null hypothesis and conclude that the two variables are dependent (or associated).

In a one-way ANOVA test, a significant p-value indicates that some of the group means are different, but we don’t know which pairs of groups are different. We perform multiple pairwise comparisons to determine if the mean difference between specific pairs of the group is statistically significant.

  • As the p-value in ANOVA (p<0.001) is less than the significance level of 0.05, we can conclude that there are significant differences in sleeping time among BMI categories.

  • After the multiple pairwise comparisons, it can be seen from the output that only the differences between underweight-normal weight and underweight-overweight groups are not significant with an adjusted p-value > 0.05.

In summary, the results indicate that there is evidence of an association between sleeping time and nutritional status (BMI categories) in US individuals.

**