Setup

Load packages

library(ggplot2)
## Warning: package 'ggplot2' was built under R version 4.5.2
library(dplyr)

Load data

load("brfss2013.RData")

Part 1: Data

Data Collection BRFSS collects data through random telephone surveys—landline and cell—across all 50 states and US territories. One adult per household is randomly selected. Interviews cover health behaviors, chronic conditions, access to care, and demographics. The CDC and state health departments run this annually. Results generalize to non-institutionalized US adults 18+. Random sampling. Large n (~492,000 in 2013). Broad geographic coverage. Limitations: misses people in institutions, people without phones, and relies on self-report (introduces potential bias). No causal claims. This is observational—no random assignment. Associations only. If we see a link between sleep and depression, we can’t say one causes the other.


Part 2: Research questions

Research quesion 1: Is there an association between sleep duration (sleptim1) and mental health days (menthlth), and does this relationship differ by employment status (employ1)? Sleep and mental health are linked, but the pressure of work—or lack of it—might change that dynamic. Unemployed people and employed people face different stressors.

Research quesion 2: Among adults with arthritis (havarth3), is there a relationship between joint pain severity (joinpain) and physical activity (exerany2), and does income (income2) moderate this?

Pain limits movement, but does money buy access to better management—physical therapy, gym memberships, medication—that keeps people active despite pain?

Research quesion 3: Is there an association between internet use (internet) and depression diagnosis (addepev2)? We talk about screen time and mental health constantly, but the relationship isn’t simple. Internet access also means access to resources, connection, telehealth. Worth examining in a large population sample. * * *

Part 3: Exploratory data analysis

Research quesion 1:

rq1_data <- brfss2013 %>%
  filter(!is.na(sleptim1), !is.na(menthlth), !is.na(employ1), sleptim1 <= 24)

rq1_data %>%
  group_by(employ1) %>%
  summarise(
    n = n(),
    mean_sleep = mean(sleptim1),
    mean_mental = mean(menthlth),
    median_sleep = median(sleptim1),
    median_mental = median(menthlth)
  )
## # A tibble: 8 × 6
##   employ1                    n mean_sleep mean_mental median_sleep median_mental
##   <fct>                  <int>      <dbl>       <dbl>        <dbl>         <dbl>
## 1 Employed for wages    198958       6.89        2.69            7             0
## 2 Self-employed          39082       7.08        2.39            7             0
## 3 Out of work for 1 ye…  13527       6.91        6.66            7             0
## 4 Out of work for less…  11906       6.99        5.88            7             0
## 5 A homemaker            30537       7.19        3.06            7             0
## 6 A student              12451       7.08        4.11            7             0
## 7 Retired               132637       7.35        2.12            7             0
## 8 Unable to work         34564       6.75       10.7             6             5
# Visualization
ggplot(rq1_data, aes(x = sleptim1, y = menthlth)) +
  geom_point(alpha = 0.1) +
  geom_smooth(method = "lm", se = FALSE) +
  facet_wrap(~employ1) +
  labs(
    title = "Sleep vs. Poor Mental Health Days by Employment",
    x = "Hours of Sleep",
    y = "Days Mental Health Not Good")
## `geom_smooth()` using formula = 'y ~ x'

The “Unable to work” group shows the worst mental health (mean 10.7 bad days) and least sleep (mean 6.75 hours). Retired adults have the best mental health (mean 2.12 bad days) and most sleep (mean 7.35 hours). The scatterplots show a negative relationship across all groups—less sleep associates with more bad mental health days. The slope is steepest for those unable to work, suggesting sleep deprivation hits this group hardest.

Research quesion 2:

rq2_data <- brfss2013 %>%
  filter(havarth3 == "Yes") %>%
  filter(!is.na(joinpain), !is.na(exerany2), !is.na(income2))

rq2_data %>%
  group_by(exerany2, income2) %>%
  summarise(
    n = n(),
    mean_pain = mean(joinpain),
    median_pain = median(joinpain)
  )
## `summarise()` has grouped output by 'exerany2'. You can override using the
## `.groups` argument.
## # A tibble: 16 × 5
## # Groups:   exerany2 [2]
##    exerany2 income2               n mean_pain median_pain
##    <fct>    <fct>             <int>     <dbl>       <dbl>
##  1 Yes      Less than $10,000  5131      6.24           7
##  2 Yes      Less than $15,000  5989      5.73           6
##  3 Yes      Less than $20,000  7286      5.26           5
##  4 Yes      Less than $25,000  8861      4.73           5
##  5 Yes      Less than $35,000 10345      4.37           4
##  6 Yes      Less than $50,000 12876      4.00           4
##  7 Yes      Less than $75,000 12981      3.75           3
##  8 Yes      $75,000 or more   19884      3.35           3
##  9 No       Less than $10,000  4499      7.05           8
## 10 No       Less than $15,000  5214      6.40           7
## 11 No       Less than $20,000  5562      6.03           6
## 12 No       Less than $25,000  6034      5.57           6
## 13 No       Less than $35,000  5916      5.21           5
## 14 No       Less than $50,000  5987      4.78           5
## 15 No       Less than $75,000  4758      4.60           5
## 16 No       $75,000 or more    4892      4.26           4
ggplot(rq2_data, aes(x = income2, y = joinpain, fill = exerany2)) +
  geom_boxplot() +
  labs(
    title = "Joint Pain by Income and Exercise Status (Arthritis Patients)",
    x = "Income Level",
    y = "Joint Pain (0-10)",
    fill = "Exercised Past 30 Days"
  ) +
  theme(axis.text.x = element_text(angle = 45, hjust = 1))

Clear pattern: higher income associates with lower pain, regardless of exercise status. But exercisers report less pain than non-exercisers at every income level. The gap between exercisers and non-exercisers is largest at the lowest income levels (mean pain 6.24 vs 7.05 for <$10K) and shrinks at higher incomes (3.35 vs 4.26 for $75K+). This suggests exercise may be especially protective for lower-income arthritis patients, though higher income provides benefits independent of exercise.

Research quesion 3:

rq3_data <- brfss2013 %>%
  filter(!is.na(internet), !is.na(addepev2))

rq3_data %>%
  group_by(internet) %>%
  summarise(
    n = n(),
    depression_rate = mean(addepev2 == "Yes") * 100
  )
## # A tibble: 2 × 3
##   internet      n depression_rate
##   <fct>     <int>           <dbl>
## 1 Yes      366087            19.3
## 2 No       118542            20.6
ggplot(rq3_data, aes(x = internet, fill = addepev2)) +
  geom_bar(position = "fill") +
  labs(
    title = "Depression Diagnosis by Internet Use",
    x = "Used Internet Past 30 Days",
    y = "Proportion",
    fill = "Ever Told Had Depression"
  ) +
  scale_y_continuous(labels = scales::percent)

Surprisingly small difference. Non-internet users have slightly higher depression rates (20.6%) than internet users (19.3%). This contradicts the simple “screens cause depression” narrative. However, this is observational—we can’t determine direction. Non-internet users skew older and lower-income, both associated with depression. The relationship between internet use and mental health is more complex than popular discourse suggests.