Introduction

Research Question:
Is there a statistically significant difference in weekly physical activity levels between adults with low depressive symptoms and those with moderate to severe depressive symptoms?

This project uses NHANES 2017 to 2018 data. The dataset includes health and lifestyle survey information from individuals in the United States. After merging the depression screener file and physical activity file, the dataset has 5533 observations and 27 variables.

The main variables used in this project are PHQ 9 depression score and weekly moderate physical activity minutes. Depression score is created by adding DPQ010 through DPQ090. People with a PHQ 9 score less than 10 are placed in the low depressive symptoms group. People with a PHQ 9 score of 10 or higher are placed in the moderate to severe depressive symptoms group. Physical activity is measured using weekly moderate recreational activity minutes.

This topic was chosen because depression and physical inactivity are both major public health concerns, and understanding their relationship could inform treatment strategies.

Source: https://wwwn.cdc.gov/nchs/nhanes/continuousnhanes/default.aspx?BeginYear=2017

Justification of Approach

A two sample t test is appropriate because the research question compares the mean weekly physical activity minutes between two independent groups. The response variable, weekly physical activity minutes, is quantitative. The grouping variable, depression group, is categorical with two levels. A boxplot is also useful because it visually compares the spread, center, and possible outliers between the two groups.

Data Analysis

The data was cleaned by merging the depression and physical activity files using SEQN, creating a PHQ 9 depression score, creating depression groups, selecting the needed variables, and removing missing or invalid values.

project_data <- raw_data %>%
  mutate(
    DepressionScore = rowSums(
      select(., DPQ010, DPQ020, DPQ030, DPQ040, DPQ050, DPQ060, DPQ070, DPQ080, DPQ090),
      na.rm = TRUE
    ),
    DepressionGroup = case_when(
      DepressionScore < 10 ~ "Low",
      DepressionScore >= 10 ~ "ModerateSevere"
    )
  ) %>%
  select(SEQN, DepressionScore, DepressionGroup, PAD615) %>%
  filter(
    !is.na(DepressionScore),
    !is.na(DepressionGroup),
    !is.na(PAD615),
    PAD615 < 7777
  )
project_data %>%
  group_by(DepressionGroup) %>%
  summarise(
    count = n(),
    mean_activity = mean(PAD615),
    median_activity = median(PAD615),
    sd_activity = sd(PAD615)
  )
## # A tibble: 2 × 5
##   DepressionGroup count mean_activity median_activity sd_activity
##   <chr>           <int>         <dbl>           <dbl>       <dbl>
## 1 Low              1190          198.             180        161.
## 2 ModerateSevere    122          195.             120        154.
ggplot(project_data, aes(x = DepressionGroup, y = PAD615, fill = DepressionGroup)) +
  geom_boxplot() +
  labs(
    title = "Weekly Physical Activity by Depression Group",
    x = "Depression Group",
    y = "Weekly Moderate Physical Activity Minutes"
  )

Statistical Analysis

The null hypothesis states that there is no difference in mean weekly physical activity between the two depression groups. The alternative hypothesis states that there is a difference.

H₀: μ₁ = μ₂
Hₐ: μ₁ ≠ μ₂

μ₁ = mean weekly physical activity minutes for adults with low depressive symptoms
μ₂ = mean weekly physical activity minutes for adults with moderate to severe depressive symptoms

Significance level: α = 0.05

t_test_result <- t.test(PAD615 ~ DepressionGroup, data = project_data)
t_test_result
## 
##  Welch Two Sample t-test
## 
## data:  PAD615 by DepressionGroup
## t = 0.24913, df = 149.37, p-value = 0.8036
## alternative hypothesis: true difference in means between group Low and group ModerateSevere is not equal to 0
## 95 percent confidence interval:
##  -25.43676  32.77620
## sample estimates:
##            mean in group Low mean in group ModerateSevere 
##                     198.2353                     194.5656

The p value is 0.8036.

Since the p value is greater than 0.05, we fail to reject H₀.

This means there is not enough statistical evidence to conclude that weekly physical activity levels are different between adults with low depressive symptoms and adults with moderate to severe depressive symptoms.

Discussion of Results

The low depression group had a mean of 198.24 minutes per week compared to 194.57 minutes for the moderate to severe group, a difference of 3.67 minutes.

The boxplot and summary statistics help show how weekly physical activity differs across the two depression groups. The t test gives the formal statistical answer by testing whether the difference in group means is statistically significant.

This analysis directly answers the research question by comparing one quantitative outcome across two clearly defined groups and supporting the conclusion with a hypothesis test.

Conclusion and Future Directions

This project examined whether adults with low depressive symptoms and adults with moderate to severe depressive symptoms differ in weekly physical activity levels. The t test result directly answers the research question by comparing the mean physical activity minutes between the two groups.

A limitation is that physical activity and depression are complex. Other variables, such as age, income, health status, sleep, and environment, may also influence this relationship. Future research could include additional variables or use regression models to better understand these factors.

References

National Center for Health Statistics. NHANES 2017 to 2018.
https://wwwn.cdc.gov/nchs/nhanes/continuousnhanes/default.aspx?BeginYear=2017