## Warning: package 'ggplot2' was built under R version 4.5.2
Data Collection BRFSS collects data through random telephone surveys—landline and cell—across all 50 states and US territories. One adult per household is randomly selected. Interviews cover health behaviors, chronic conditions, access to care, and demographics. The CDC and state health departments run this annually. Results generalize to non-institutionalized US adults 18+. Random sampling. Large n (~492,000 in 2013). Broad geographic coverage. Limitations: misses people in institutions, people without phones, and relies on self-report (introduces potential bias). No causal claims. This is observational—no random assignment. Associations only. If we see a link between sleep and depression, we can’t say one causes the other.
Research quesion 1: Is there an association between sleep duration (sleptim1) and mental health days (menthlth), and does this relationship differ by employment status (employ1)? Sleep and mental health are linked, but the pressure of work—or lack of it—might change that dynamic. Unemployed people and employed people face different stressors.
Research quesion 2: Among adults with arthritis (havarth3), is there a relationship between joint pain severity (joinpain) and physical activity (exerany2), and does income (income2) moderate this?
Pain limits movement, but does money buy access to better management—physical therapy, gym memberships, medication—that keeps people active despite pain?
Research quesion 3: Is there an association between internet use (internet) and depression diagnosis (addepev2)? We talk about screen time and mental health constantly, but the relationship isn’t simple. Internet access also means access to resources, connection, telehealth. Worth examining in a large population sample. * * *
Research quesion 1:
rq1_data <- brfss2013 %>%
filter(!is.na(sleptim1), !is.na(menthlth), !is.na(employ1), sleptim1 <= 24)
rq1_data %>%
group_by(employ1) %>%
summarise(
n = n(),
mean_sleep = mean(sleptim1),
mean_mental = mean(menthlth),
median_sleep = median(sleptim1),
median_mental = median(menthlth)
)## # A tibble: 8 × 6
## employ1 n mean_sleep mean_mental median_sleep median_mental
## <fct> <int> <dbl> <dbl> <dbl> <dbl>
## 1 Employed for wages 198958 6.89 2.69 7 0
## 2 Self-employed 39082 7.08 2.39 7 0
## 3 Out of work for 1 ye… 13527 6.91 6.66 7 0
## 4 Out of work for less… 11906 6.99 5.88 7 0
## 5 A homemaker 30537 7.19 3.06 7 0
## 6 A student 12451 7.08 4.11 7 0
## 7 Retired 132637 7.35 2.12 7 0
## 8 Unable to work 34564 6.75 10.7 6 5
# Visualization
ggplot(rq1_data, aes(x = sleptim1, y = menthlth)) +
geom_point(alpha = 0.1) +
geom_smooth(method = "lm", se = FALSE) +
facet_wrap(~employ1) +
labs(
title = "Sleep vs. Poor Mental Health Days by Employment",
x = "Hours of Sleep",
y = "Days Mental Health Not Good")## `geom_smooth()` using formula = 'y ~ x'
The “Unable to work” group shows the worst mental health (mean 10.7 bad
days) and least sleep (mean 6.75 hours). Retired adults have the best
mental health (mean 2.12 bad days) and most sleep (mean 7.35 hours). The
scatterplots show a negative relationship across all groups—less sleep
associates with more bad mental health days. The slope is steepest for
those unable to work, suggesting sleep deprivation hits this group
hardest.
Research quesion 2:
rq2_data <- brfss2013 %>%
filter(havarth3 == "Yes") %>%
filter(!is.na(joinpain), !is.na(exerany2), !is.na(income2))
rq2_data %>%
group_by(exerany2, income2) %>%
summarise(
n = n(),
mean_pain = mean(joinpain),
median_pain = median(joinpain)
)## `summarise()` has grouped output by 'exerany2'. You can override using the
## `.groups` argument.
## # A tibble: 16 × 5
## # Groups: exerany2 [2]
## exerany2 income2 n mean_pain median_pain
## <fct> <fct> <int> <dbl> <dbl>
## 1 Yes Less than $10,000 5131 6.24 7
## 2 Yes Less than $15,000 5989 5.73 6
## 3 Yes Less than $20,000 7286 5.26 5
## 4 Yes Less than $25,000 8861 4.73 5
## 5 Yes Less than $35,000 10345 4.37 4
## 6 Yes Less than $50,000 12876 4.00 4
## 7 Yes Less than $75,000 12981 3.75 3
## 8 Yes $75,000 or more 19884 3.35 3
## 9 No Less than $10,000 4499 7.05 8
## 10 No Less than $15,000 5214 6.40 7
## 11 No Less than $20,000 5562 6.03 6
## 12 No Less than $25,000 6034 5.57 6
## 13 No Less than $35,000 5916 5.21 5
## 14 No Less than $50,000 5987 4.78 5
## 15 No Less than $75,000 4758 4.60 5
## 16 No $75,000 or more 4892 4.26 4
ggplot(rq2_data, aes(x = income2, y = joinpain, fill = exerany2)) +
geom_boxplot() +
labs(
title = "Joint Pain by Income and Exercise Status (Arthritis Patients)",
x = "Income Level",
y = "Joint Pain (0-10)",
fill = "Exercised Past 30 Days"
) +
theme(axis.text.x = element_text(angle = 45, hjust = 1))
Clear pattern: higher income associates with lower pain, regardless of
exercise status. But exercisers report less pain than non-exercisers at
every income level. The gap between exercisers and non-exercisers is
largest at the lowest income levels (mean pain 6.24 vs 7.05 for
<$10K) and shrinks at higher incomes (3.35 vs 4.26 for $75K+). This
suggests exercise may be especially protective for lower-income
arthritis patients, though higher income provides benefits independent
of exercise.
Research quesion 3:
rq3_data <- brfss2013 %>%
filter(!is.na(internet), !is.na(addepev2))
rq3_data %>%
group_by(internet) %>%
summarise(
n = n(),
depression_rate = mean(addepev2 == "Yes") * 100
)## # A tibble: 2 × 3
## internet n depression_rate
## <fct> <int> <dbl>
## 1 Yes 366087 19.3
## 2 No 118542 20.6
ggplot(rq3_data, aes(x = internet, fill = addepev2)) +
geom_bar(position = "fill") +
labs(
title = "Depression Diagnosis by Internet Use",
x = "Used Internet Past 30 Days",
y = "Proportion",
fill = "Ever Told Had Depression"
) +
scale_y_continuous(labels = scales::percent)Surprisingly small difference. Non-internet users have slightly higher depression rates (20.6%) than internet users (19.3%). This contradicts the simple “screens cause depression” narrative. However, this is observational—we can’t determine direction. Non-internet users skew older and lower-income, both associated with depression. The relationship between internet use and mental health is more complex than popular discourse suggests.