For this project, I used the dataset “Mental Health Care in the Last 4 weeks”, which was published by the Centers for Disease Control (CDC), through Data.gov. The dataset came from the Household Pulse Survey, which collected information regarding mental health care in the United States during the COVID-19 pandemic period. The full dataset contains 10,404 observations and 15 variables. Each case in the dataset is a mental health care estimate, organized by various categories like indicator, group, time, value, state, etc.
I chose this topic because mental health is a huge underlying public health issue, and it was especially prevalent during the COVID-19 pandemic period, while the data was collected. More specifically, I’m attempting to find out if certain age groups receive higher rates of counseling or therapy, comparatively. If found, this information can help identify which age groups in the U.S aren’t receiving the necessary help for their conditions. Needless to say, ensuring there aren’t specific minorities receiving less care than other, less marginalized groups is an important issue.
library(tidyverse)
## ── Attaching core tidyverse packages ──────────────────────── tidyverse 2.0.0 ──
## ✔ dplyr 1.2.0 ✔ readr 2.1.6
## ✔ forcats 1.0.1 ✔ stringr 1.6.0
## ✔ ggplot2 4.0.2 ✔ tibble 3.3.1
## ✔ lubridate 1.9.5 ✔ tidyr 1.3.2
## ✔ purrr 1.2.1
## ── Conflicts ────────────────────────────────────────── tidyverse_conflicts() ──
## ✖ dplyr::filter() masks stats::filter()
## ✖ dplyr::lag() masks stats::lag()
## ℹ Use the conflicted package (<http://conflicted.r-lib.org/>) to force all conflicts to become errors
mental_health <- read.csv("Mental_Health_Care_in_the_Last_4_Weeks.csv")
head(mental_health)
## Indicator Group
## 1 Received Counseling or Therapy, Last 4 Weeks By Sex
## 2 Received Counseling or Therapy, Last 4 Weeks By Sex
## 3 Needed Counseling or Therapy But Did Not Get It, Last 4 Weeks By Sex
## 4 Took Prescription Medication for Mental Health, Last 4 Weeks By Age
## 5 Took Prescription Medication for Mental Health, Last 4 Weeks By Age
## 6 Took Prescription Medication for Mental Health, Last 4 Weeks By Age
## State Subgroup Phase Time.Period Time.Period.Label
## 1 United States Male 2 15 Sep 16 - Sep 28, 2020
## 2 United States Female 2 15 Sep 16 - Sep 28, 2020
## 3 United States Female -1 1 Dec 22, 2020 - Jan 5, 2021
## 4 United States 50 - 59 years -1 1 Mar 30 - Apr 13, 2021
## 5 United States 60 - 69 years -1 1 Mar 30 - Apr 13, 2021
## 6 United States 70 - 79 years -1 1 Mar 30 - Apr 13, 2021
## Time.Period.Start.Date Time.Period.End.Date Value LowCI HighCI
## 1 09/16/2020 09/28/2020 6.9 6.5 7.3
## 2 09/16/2020 09/28/2020 11.0 10.4 11.6
## 3 12/22/2020 01/05/2021 NA NA NA
## 4 03/30/2021 04/13/2021 NA NA NA
## 5 03/30/2021 04/13/2021 NA NA NA
## 6 03/30/2021 04/13/2021 NA NA NA
## Confidence.Interval Quartile.Range Suppression.Flag
## 1 6.5 - 7.3 NA
## 2 10.4 - 11.6 NA
## 3 NA
## 4 NA
## 5 NA
## 6 NA
dim(mental_health)
## [1] 10404 15
names(mental_health)
## [1] "Indicator" "Group" "State"
## [4] "Subgroup" "Phase" "Time.Period"
## [7] "Time.Period.Label" "Time.Period.Start.Date" "Time.Period.End.Date"
## [10] "Value" "LowCI" "HighCI"
## [13] "Confidence.Interval" "Quartile.Range" "Suppression.Flag"
mental_health |>
count(Indicator)
## Indicator
## 1 Needed Counseling or Therapy But Did Not Get It, Last 4 Weeks
## 2 Received Counseling or Therapy, Last 4 Weeks
## 3 Took Prescription Medication for Mental Health And/Or Received Counseling or Therapy, Last 4 Weeks
## 4 Took Prescription Medication for Mental Health, Last 4 Weeks
## n
## 1 2601
## 2 2601
## 3 2601
## 4 2601
mental_health |>
count(Group)
## Group n
## 1 By Age 1064
## 2 By Disability status 168
## 3 By Education 608
## 4 By Gender identity 156
## 5 By Presence of Symptoms of Anxiety/Depression 304
## 6 By Race/Hispanic ethnicity 760
## 7 By Sex 304
## 8 By Sexual orientation 156
## 9 By State 6732
## 10 National Estimate 152
mental_health_age <- mental_health |>
filter(Indicator == "Received Counseling or Therapy, Last 4 Weeks") |>
filter(Group == "By Age") |>
filter(!is.na(Value)) |>
select(Indicator, Group, Subgroup, Time.Period.Label, Value) |>
mutate(Subgroup = as.factor(Subgroup)) |>
arrange(Subgroup)
head(mental_health_age)
## Indicator Group Subgroup
## 1 Received Counseling or Therapy, Last 4 Weeks By Age 18 - 29 years
## 2 Received Counseling or Therapy, Last 4 Weeks By Age 18 - 29 years
## 3 Received Counseling or Therapy, Last 4 Weeks By Age 18 - 29 years
## 4 Received Counseling or Therapy, Last 4 Weeks By Age 18 - 29 years
## 5 Received Counseling or Therapy, Last 4 Weeks By Age 18 - 29 years
## 6 Received Counseling or Therapy, Last 4 Weeks By Age 18 - 29 years
## Time.Period.Label Value
## 1 Aug 19 - Aug 31, 2020 12.2
## 2 Sep 2 - Sep 14, 2020 13.0
## 3 Sep 16 - Sep 28, 2020 12.1
## 4 Sep 30 - Oct 12, 2020 12.5
## 5 Oct 14 - Oct 26, 2020 14.5
## 6 Oct 28 - Nov 9, 2020 14.8
dim(mental_health_age)
## [1] 225 5
mental_health_age |>
count(Subgroup)
## Subgroup n
## 1 18 - 29 years 33
## 2 30 - 39 years 33
## 3 40 - 49 years 33
## 4 50 - 59 years 33
## 5 60 - 69 years 33
## 6 70 - 79 years 33
## 7 80 years and above 27
I began by narrowing down the original dataset to focus on the “receiving counseling or therapy, last 4 weeks”. I did this because my research question focuses only on counseling or therapy, not on other mental health measures included in the dataset. Next, I filtered the data again to only include the “By Age” group. This helped me and let me compare values amongst age groups efficiently. I removed rows with the missing sections in the “Value” column, because it is the main thing being used for the analysis. Next I selected the only variables needed for the project, which were: Indicator, Group, Subgroup, and Value. Finally, to find the mean, I summarized the data by age group.
age_summary <- mental_health_age |>
group_by(Subgroup) |>
summarize(
mean_value = mean(Value),
sd_value = sd(Value),
n = n()
)
age_summary
## # A tibble: 7 × 4
## Subgroup mean_value sd_value n
## <fct> <dbl> <dbl> <int>
## 1 18 - 29 years 15.2 1.79 33
## 2 30 - 39 years 14.5 1.18 33
## 3 40 - 49 years 11.4 0.705 33
## 4 50 - 59 years 9.05 0.656 33
## 5 60 - 69 years 6.11 0.503 33
## 6 70 - 79 years 4.17 0.487 33
## 7 80 years and above 3.65 1.56 27
boxplot(Value ~ Subgroup,
data = mental_health_age,
main = "Counseling or Therapy Use by Age Group",
xlab = "Age Group",
ylab = "Percent Receiving Counseling",
las = 2)
anova_model <- aov(Value ~ Subgroup, data = mental_health_age)
summary(anova_model)
## Df Sum Sq Mean Sq F value Pr(>F)
## Subgroup 6 4263 710.5 607.1 <2e-16 ***
## Residuals 218 255 1.2
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
TukeyHSD(anova_model)
## Tukey multiple comparisons of means
## 95% family-wise confidence level
##
## Fit: aov(formula = Value ~ Subgroup, data = mental_health_age)
##
## $Subgroup
## diff lwr upr p adj
## 30 - 39 years-18 - 29 years -0.7787879 -1.571401 0.0138254 0.0576892
## 40 - 49 years-18 - 29 years -3.8151515 -4.607765 -3.0225382 0.0000000
## 50 - 59 years-18 - 29 years -6.1878788 -6.980492 -5.3952655 0.0000000
## 60 - 69 years-18 - 29 years -9.1303030 -9.922916 -8.3376897 0.0000000
## 70 - 79 years-18 - 29 years -11.0727273 -11.865341 -10.2801140 0.0000000
## 80 years and above-18 - 29 years -11.5942761 -12.429764 -10.7587883 0.0000000
## 40 - 49 years-30 - 39 years -3.0363636 -3.828977 -2.2437504 0.0000000
## 50 - 59 years-30 - 39 years -5.4090909 -6.201704 -4.6164776 0.0000000
## 60 - 69 years-30 - 39 years -8.3515152 -9.144128 -7.5589019 0.0000000
## 70 - 79 years-30 - 39 years -10.2939394 -11.086553 -9.5013261 0.0000000
## 80 years and above-30 - 39 years -10.8154882 -11.650976 -9.9800005 0.0000000
## 50 - 59 years-40 - 49 years -2.3727273 -3.165341 -1.5801140 0.0000000
## 60 - 69 years-40 - 49 years -5.3151515 -6.107765 -4.5225382 0.0000000
## 70 - 79 years-40 - 49 years -7.2575758 -8.050189 -6.4649625 0.0000000
## 80 years and above-40 - 49 years -7.7791246 -8.614612 -6.9436368 0.0000000
## 60 - 69 years-50 - 59 years -2.9424242 -3.735038 -2.1498110 0.0000000
## 70 - 79 years-50 - 59 years -4.8848485 -5.677462 -4.0922352 0.0000000
## 80 years and above-50 - 59 years -5.4063973 -6.241885 -4.5709095 0.0000000
## 70 - 79 years-60 - 69 years -1.9424242 -2.735038 -1.1498110 0.0000000
## 80 years and above-60 - 69 years -2.4639731 -3.299461 -1.6284853 0.0000000
## 80 years and above-70 - 79 years -0.5215488 -1.357037 0.3139389 0.5106329
For this project, I utilized an ANOVA test to compare the mean percentages of people receiving counseling or therapy, which needed to be done across more than 2 age groups, meaning that ANOVA is the correct option. The dependent variable was “Value”, representing the percent of people who received therapy or counseling in the last 4 weeks. The independent variable was the subgroup, which represented the differing age groups.
The null hypothesis is that the mean percent of people receiving counseling or therapy is the same amongst all age groups.
H₀: μ₁ = μ₂ = μ₃ = μ₄ = μ₅ = μ₆ = μ₇
Hₐ: At least one age group mean is different.
The ANOVA test generated a P-Value of less than 2e-16, meaning it is much, much smaller than .05 (level of significance). Because of this, we can reject the null hypothesis and acknowledge that there is a statistically significant difference in the average percentage of individuals receiving counseling amongst all age groups. The boxplot visualizes this perfectly, displaying the younger age groups as generally much higher than the older ones.
After the ANOVA test, the Tukey HSD post test was used to identify which specific age groups were different from each other. The results showed that most age group comparisons were statistically significant. The exception being the 19-29 and the 30-39 not being significant, with a P value of .0577. Additionally, the comparison between 70-79 and 80+ was not significant, garnering a P value of .05106. Overall, the results suggest that younger adults generally reported higher average rates of receiving counseling when compared to older adults.
The results of the analysis show that the rates of receiving counseling over 4 weeks, during the COVID-19 pandemic, were not the same amongst all age groups. The ANOVA test had a P value much smaller than .05, meaning the null hypothesis needed to be rejected. The summary statistics’ boxplot did a great job at visualizing the data, showing a general trend of younger individuals receiving greater average rates of counseling than older ones.
These findings are important and relevant towards the research question because they confirm that age is connected to differences in therapy percentages amongst age groups. This doesn’t prove that age is a direct cause for less counseling, but it does prove correlation. One limitation of this study was that the data were based on reported percentages, which are estimates from surveys, which can sometimes be skewed. Additionally, the project doesn’t work to explain why there is such a trend regarding younger individuals receiving so much more help than older ones, which is an obvious limitation. Working towards finding variables and groupings to place individuals into would be helpful for evolving the future of this study. Some future groupings could include categories like income, insurance, gender, or race, to better understand the influences on mental health care.
Centers for Disease Control and Prevention. (2025). Mental Health Care in the Last 4 Weeks. U.S. Department of Health & Human Services, Data.gov. Retrieved from Data.gov.