Research Question:
Is there a statistically significant difference in weekly physical
activity levels between adults with low depressive symptoms and those
with moderate to severe depressive symptoms?
This project uses NHANES 2017 to 2018 data. The dataset includes health and lifestyle survey information from individuals in the United States. After merging the depression screener file and physical activity file, the dataset has 5533 observations and 27 variables.
The main variables used in this project are PHQ 9 depression score and weekly moderate physical activity minutes. Depression score is created by adding DPQ010 through DPQ090. People with a PHQ 9 score less than 10 are placed in the low depressive symptoms group. People with a PHQ 9 score of 10 or higher are placed in the moderate to severe depressive symptoms group. Physical activity is measured using weekly moderate recreational activity minutes.
This topic was chosen because depression and physical inactivity are both major public health concerns, and understanding their relationship could inform treatment strategies.
Source: https://wwwn.cdc.gov/nchs/nhanes/continuousnhanes/default.aspx?BeginYear=2017
A two sample t test is appropriate because the research question compares the mean weekly physical activity minutes between two independent groups. The response variable, weekly physical activity minutes, is quantitative. The grouping variable, depression group, is categorical with two levels. A boxplot is also useful because it visually compares the spread, center, and possible outliers between the two groups.
The data was cleaned by merging the depression and physical activity files using SEQN, creating a PHQ 9 depression score, creating depression groups, selecting the needed variables, and removing missing or invalid values.
project_data <- raw_data %>%
mutate(
DepressionScore = rowSums(
select(., DPQ010, DPQ020, DPQ030, DPQ040, DPQ050, DPQ060, DPQ070, DPQ080, DPQ090),
na.rm = TRUE
),
DepressionGroup = case_when(
DepressionScore < 10 ~ "Low",
DepressionScore >= 10 ~ "ModerateSevere"
)
) %>%
select(SEQN, DepressionScore, DepressionGroup, PAD615) %>%
filter(
!is.na(DepressionScore),
!is.na(DepressionGroup),
!is.na(PAD615),
PAD615 < 7777
)
project_data %>%
group_by(DepressionGroup) %>%
summarise(
count = n(),
mean_activity = mean(PAD615),
median_activity = median(PAD615),
sd_activity = sd(PAD615)
)
## # A tibble: 2 × 5
## DepressionGroup count mean_activity median_activity sd_activity
## <chr> <int> <dbl> <dbl> <dbl>
## 1 Low 1190 198. 180 161.
## 2 ModerateSevere 122 195. 120 154.
ggplot(project_data, aes(x = DepressionGroup, y = PAD615, fill = DepressionGroup)) +
geom_boxplot() +
labs(
title = "Weekly Physical Activity by Depression Group",
x = "Depression Group",
y = "Weekly Moderate Physical Activity Minutes"
)
The null hypothesis states that there is no difference in mean weekly physical activity between the two depression groups. The alternative hypothesis states that there is a difference.
H₀: μ₁ = μ₂
Hₐ: μ₁ ≠ μ₂
μ₁ = mean weekly physical activity minutes for adults with low
depressive symptoms
μ₂ = mean weekly physical activity minutes for adults with moderate to
severe depressive symptoms
Significance level: α = 0.05
t_test_result <- t.test(PAD615 ~ DepressionGroup, data = project_data)
t_test_result
##
## Welch Two Sample t-test
##
## data: PAD615 by DepressionGroup
## t = 0.24913, df = 149.37, p-value = 0.8036
## alternative hypothesis: true difference in means between group Low and group ModerateSevere is not equal to 0
## 95 percent confidence interval:
## -25.43676 32.77620
## sample estimates:
## mean in group Low mean in group ModerateSevere
## 198.2353 194.5656
The p value is 0.8036.
Since the p value is greater than 0.05, we fail to reject H₀.
This means there is not enough statistical evidence to conclude that weekly physical activity levels are different between adults with low depressive symptoms and adults with moderate to severe depressive symptoms.
The low depression group had a mean of 198.24 minutes per week compared to 194.57 minutes for the moderate to severe group, a difference of 3.67 minutes.
The boxplot and summary statistics help show how weekly physical activity differs across the two depression groups. The t test gives the formal statistical answer by testing whether the difference in group means is statistically significant.
This analysis directly answers the research question by comparing one quantitative outcome across two clearly defined groups and supporting the conclusion with a hypothesis test.
This project examined whether adults with low depressive symptoms and adults with moderate to severe depressive symptoms differ in weekly physical activity levels. The t test result directly answers the research question by comparing the mean physical activity minutes between the two groups.
A limitation is that physical activity and depression are complex. Other variables, such as age, income, health status, sleep, and environment, may also influence this relationship. Future research could include additional variables or use regression models to better understand these factors.
National Center for Health Statistics. NHANES 2017 to 2018.
https://wwwn.cdc.gov/nchs/nhanes/continuousnhanes/default.aspx?BeginYear=2017