Introduction

Research Question: Do rates of receiving counseling or therapy in the last 4 weeks differ across age groups in the United States?

For this project, I used the dataset “Mental Health Care in the Last 4 weeks”, which was published by the Centers for Disease Control (CDC), through Data.gov. The dataset came from the Household Pulse Survey, which collected information regarding mental health care in the United States during the COVID-19 pandemic period. The full dataset contains 10,404 observations and 15 variables. Each case in the dataset is a mental health care estimate, organized by various categories like indicator, group, time, value, state, etc.

I chose this topic because mental health is a huge underlying public health issue, and it was especially prevalent during the COVID-19 pandemic period, while the data was collected. More specifically, I’m attempting to find out if certain age groups receive higher rates of counseling or therapy, comparatively. If found, this information can help identify which age groups in the U.S aren’t receiving the necessary help for their conditions. Needless to say, ensuring there aren’t specific minorities receiving less care than other, less marginalized groups is an important issue.

Loading Dataset

library(tidyverse)
## ── Attaching core tidyverse packages ──────────────────────── tidyverse 2.0.0 ──
## ✔ dplyr     1.2.0     ✔ readr     2.1.6
## ✔ forcats   1.0.1     ✔ stringr   1.6.0
## ✔ ggplot2   4.0.2     ✔ tibble    3.3.1
## ✔ lubridate 1.9.5     ✔ tidyr     1.3.2
## ✔ purrr     1.2.1     
## ── Conflicts ────────────────────────────────────────── tidyverse_conflicts() ──
## ✖ dplyr::filter() masks stats::filter()
## ✖ dplyr::lag()    masks stats::lag()
## ℹ Use the conflicted package (<http://conflicted.r-lib.org/>) to force all conflicts to become errors
mental_health <- read.csv("Mental_Health_Care_in_the_Last_4_Weeks.csv")

head(mental_health)
##                                                       Indicator  Group
## 1                  Received Counseling or Therapy, Last 4 Weeks By Sex
## 2                  Received Counseling or Therapy, Last 4 Weeks By Sex
## 3 Needed Counseling or Therapy But Did Not Get It, Last 4 Weeks By Sex
## 4  Took Prescription Medication for Mental Health, Last 4 Weeks By Age
## 5  Took Prescription Medication for Mental Health, Last 4 Weeks By Age
## 6  Took Prescription Medication for Mental Health, Last 4 Weeks By Age
##           State      Subgroup Phase Time.Period          Time.Period.Label
## 1 United States          Male     2          15      Sep 16 - Sep 28, 2020
## 2 United States        Female     2          15      Sep 16 - Sep 28, 2020
## 3 United States        Female    -1           1 Dec 22, 2020 - Jan 5, 2021
## 4 United States 50 - 59 years    -1           1      Mar 30 - Apr 13, 2021
## 5 United States 60 - 69 years    -1           1      Mar 30 - Apr 13, 2021
## 6 United States 70 - 79 years    -1           1      Mar 30 - Apr 13, 2021
##   Time.Period.Start.Date Time.Period.End.Date Value LowCI HighCI
## 1             09/16/2020           09/28/2020   6.9   6.5    7.3
## 2             09/16/2020           09/28/2020  11.0  10.4   11.6
## 3             12/22/2020           01/05/2021    NA    NA     NA
## 4             03/30/2021           04/13/2021    NA    NA     NA
## 5             03/30/2021           04/13/2021    NA    NA     NA
## 6             03/30/2021           04/13/2021    NA    NA     NA
##   Confidence.Interval Quartile.Range Suppression.Flag
## 1           6.5 - 7.3                              NA
## 2         10.4 - 11.6                              NA
## 3                                                  NA
## 4                                                  NA
## 5                                                  NA
## 6                                                  NA
dim(mental_health)
## [1] 10404    15
names(mental_health)
##  [1] "Indicator"              "Group"                  "State"                 
##  [4] "Subgroup"               "Phase"                  "Time.Period"           
##  [7] "Time.Period.Label"      "Time.Period.Start.Date" "Time.Period.End.Date"  
## [10] "Value"                  "LowCI"                  "HighCI"                
## [13] "Confidence.Interval"    "Quartile.Range"         "Suppression.Flag"

Analysis

mental_health |>
  count(Indicator)
##                                                                                            Indicator
## 1                                      Needed Counseling or Therapy But Did Not Get It, Last 4 Weeks
## 2                                                       Received Counseling or Therapy, Last 4 Weeks
## 3 Took Prescription Medication for Mental Health And/Or Received Counseling or Therapy, Last 4 Weeks
## 4                                       Took Prescription Medication for Mental Health, Last 4 Weeks
##      n
## 1 2601
## 2 2601
## 3 2601
## 4 2601
mental_health |>
  count(Group)
##                                            Group    n
## 1                                         By Age 1064
## 2                           By Disability status  168
## 3                                   By Education  608
## 4                             By Gender identity  156
## 5  By Presence of Symptoms of Anxiety/Depression  304
## 6                     By Race/Hispanic ethnicity  760
## 7                                         By Sex  304
## 8                          By Sexual orientation  156
## 9                                       By State 6732
## 10                             National Estimate  152

Data Cleaning

mental_health_age <- mental_health |>
  filter(Indicator == "Received Counseling or Therapy, Last 4 Weeks") |>
  filter(Group == "By Age") |>
  filter(!is.na(Value)) |>
  select(Indicator, Group, Subgroup, Time.Period.Label, Value) |>
  mutate(Subgroup = as.factor(Subgroup)) |>
  arrange(Subgroup)

head(mental_health_age)
##                                      Indicator  Group      Subgroup
## 1 Received Counseling or Therapy, Last 4 Weeks By Age 18 - 29 years
## 2 Received Counseling or Therapy, Last 4 Weeks By Age 18 - 29 years
## 3 Received Counseling or Therapy, Last 4 Weeks By Age 18 - 29 years
## 4 Received Counseling or Therapy, Last 4 Weeks By Age 18 - 29 years
## 5 Received Counseling or Therapy, Last 4 Weeks By Age 18 - 29 years
## 6 Received Counseling or Therapy, Last 4 Weeks By Age 18 - 29 years
##       Time.Period.Label Value
## 1 Aug 19 - Aug 31, 2020  12.2
## 2  Sep 2 - Sep 14, 2020  13.0
## 3 Sep 16 - Sep 28, 2020  12.1
## 4 Sep 30 - Oct 12, 2020  12.5
## 5 Oct 14 - Oct 26, 2020  14.5
## 6  Oct 28 - Nov 9, 2020  14.8
dim(mental_health_age)
## [1] 225   5
mental_health_age |>
  count(Subgroup)
##             Subgroup  n
## 1      18 - 29 years 33
## 2      30 - 39 years 33
## 3      40 - 49 years 33
## 4      50 - 59 years 33
## 5      60 - 69 years 33
## 6      70 - 79 years 33
## 7 80 years and above 27

Data Analysis

I began by narrowing down the original dataset to focus on the “receiving counseling or therapy, last 4 weeks”. I did this because my research question focuses only on counseling or therapy, not on other mental health measures included in the dataset. Next, I filtered the data again to only include the “By Age” group. This helped me and let me compare values amongst age groups efficiently. I removed rows with the missing sections in the “Value” column, because it is the main thing being used for the analysis. Next I selected the only variables needed for the project, which were: Indicator, Group, Subgroup, and Value. Finally, to find the mean, I summarized the data by age group.

Summary Statistics

age_summary <- mental_health_age |>
  group_by(Subgroup) |>
  summarize(
    mean_value = mean(Value),
    sd_value = sd(Value),
    n = n()
  )

age_summary
## # A tibble: 7 × 4
##   Subgroup           mean_value sd_value     n
##   <fct>                   <dbl>    <dbl> <int>
## 1 18 - 29 years           15.2     1.79     33
## 2 30 - 39 years           14.5     1.18     33
## 3 40 - 49 years           11.4     0.705    33
## 4 50 - 59 years            9.05    0.656    33
## 5 60 - 69 years            6.11    0.503    33
## 6 70 - 79 years            4.17    0.487    33
## 7 80 years and above       3.65    1.56     27

Boxplot

boxplot(Value ~ Subgroup,
        data = mental_health_age,
        main = "Counseling or Therapy Use by Age Group",
        xlab = "Age Group",
        ylab = "Percent Receiving Counseling",
        las = 2)

ANOVA test

anova_model <- aov(Value ~ Subgroup, data = mental_health_age)

summary(anova_model)
##              Df Sum Sq Mean Sq F value Pr(>F)    
## Subgroup      6   4263   710.5   607.1 <2e-16 ***
## Residuals   218    255     1.2                   
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Tukey HSD

TukeyHSD(anova_model)
##   Tukey multiple comparisons of means
##     95% family-wise confidence level
## 
## Fit: aov(formula = Value ~ Subgroup, data = mental_health_age)
## 
## $Subgroup
##                                         diff        lwr         upr     p adj
## 30 - 39 years-18 - 29 years       -0.7787879  -1.571401   0.0138254 0.0576892
## 40 - 49 years-18 - 29 years       -3.8151515  -4.607765  -3.0225382 0.0000000
## 50 - 59 years-18 - 29 years       -6.1878788  -6.980492  -5.3952655 0.0000000
## 60 - 69 years-18 - 29 years       -9.1303030  -9.922916  -8.3376897 0.0000000
## 70 - 79 years-18 - 29 years      -11.0727273 -11.865341 -10.2801140 0.0000000
## 80 years and above-18 - 29 years -11.5942761 -12.429764 -10.7587883 0.0000000
## 40 - 49 years-30 - 39 years       -3.0363636  -3.828977  -2.2437504 0.0000000
## 50 - 59 years-30 - 39 years       -5.4090909  -6.201704  -4.6164776 0.0000000
## 60 - 69 years-30 - 39 years       -8.3515152  -9.144128  -7.5589019 0.0000000
## 70 - 79 years-30 - 39 years      -10.2939394 -11.086553  -9.5013261 0.0000000
## 80 years and above-30 - 39 years -10.8154882 -11.650976  -9.9800005 0.0000000
## 50 - 59 years-40 - 49 years       -2.3727273  -3.165341  -1.5801140 0.0000000
## 60 - 69 years-40 - 49 years       -5.3151515  -6.107765  -4.5225382 0.0000000
## 70 - 79 years-40 - 49 years       -7.2575758  -8.050189  -6.4649625 0.0000000
## 80 years and above-40 - 49 years  -7.7791246  -8.614612  -6.9436368 0.0000000
## 60 - 69 years-50 - 59 years       -2.9424242  -3.735038  -2.1498110 0.0000000
## 70 - 79 years-50 - 59 years       -4.8848485  -5.677462  -4.0922352 0.0000000
## 80 years and above-50 - 59 years  -5.4063973  -6.241885  -4.5709095 0.0000000
## 70 - 79 years-60 - 69 years       -1.9424242  -2.735038  -1.1498110 0.0000000
## 80 years and above-60 - 69 years  -2.4639731  -3.299461  -1.6284853 0.0000000
## 80 years and above-70 - 79 years  -0.5215488  -1.357037   0.3139389 0.5106329

Statistical Analysis

For this project, I utilized an ANOVA test to compare the mean percentages of people receiving counseling or therapy, which needed to be done across more than 2 age groups, meaning that ANOVA is the correct option. The dependent variable was “Value”, representing the percent of people who received therapy or counseling in the last 4 weeks. The independent variable was the subgroup, which represented the differing age groups.

The null hypothesis is that the mean percent of people receiving counseling or therapy is the same amongst all age groups.

H₀: μ₁ = μ₂ = μ₃ = μ₄ = μ₅ = μ₆ = μ₇

Hₐ: At least one age group mean is different.

The ANOVA test generated a P-Value of less than 2e-16, meaning it is much, much smaller than .05 (level of significance). Because of this, we can reject the null hypothesis and acknowledge that there is a statistically significant difference in the average percentage of individuals receiving counseling amongst all age groups. The boxplot visualizes this perfectly, displaying the younger age groups as generally much higher than the older ones.

After the ANOVA test, the Tukey HSD post test was used to identify which specific age groups were different from each other. The results showed that most age group comparisons were statistically significant. The exception being the 19-29 and the 30-39 not being significant, with a P value of .0577. Additionally, the comparison between 70-79 and 80+ was not significant, garnering a P value of .05106. Overall, the results suggest that younger adults generally reported higher average rates of receiving counseling when compared to older adults.

Conclusion

The results of the analysis show that the rates of receiving counseling over 4 weeks, during the COVID-19 pandemic, were not the same amongst all age groups. The ANOVA test had a P value much smaller than .05, meaning the null hypothesis needed to be rejected. The summary statistics’ boxplot did a great job at visualizing the data, showing a general trend of younger individuals receiving greater average rates of counseling than older ones.

These findings are important and relevant towards the research question because they confirm that age is connected to differences in therapy percentages amongst age groups. This doesn’t prove that age is a direct cause for less counseling, but it does prove correlation. One limitation of this study was that the data were based on reported percentages, which are estimates from surveys, which can sometimes be skewed. Additionally, the project doesn’t work to explain why there is such a trend regarding younger individuals receiving so much more help than older ones, which is an obvious limitation. Working towards finding variables and groupings to place individuals into would be helpful for evolving the future of this study. Some future groupings could include categories like income, insurance, gender, or race, to better understand the influences on mental health care.

References

Centers for Disease Control and Prevention. (2025). Mental Health Care in the Last 4 Weeks. U.S. Department of Health & Human Services, Data.gov. Retrieved from Data.gov.