Introduction

For the past few assignments, I have been working with the Australian Marriage Law dataset provided in the class data folder. For homework 6, I have decided to branch out and work with a new dataset, which I will use for my final project.

Data

The data I will be working with comes from the Substance Abuse and Mental Health Services Administration (SAMHSA), which is an agency within the US Department of Health and Human Services. The agency’s mission is to “reduce the impact of substance abuse and mental illness on America’s communities”. [https://www.samhsa.gov/]

The data comes from the National Survey on Drug Use and Health (NSDUH). The collection of detailed tables within the report contains estimates on the number of people with specified characteristics, like substance abuse or mental illness. Data is collected by field staff through the administration of in-person surveys, which moved remote in 2020 due to COVID-19 (will discuss in later section of final project potential implications of change in methodology due to pandemic). Sample design and sample size are sufficient to allow for state and national estimates. The sample includes United States non-institutionalized residents aged 12 or older, including residents of households, shelters, dormitories, and boarding houses. Individuals that were excluded from the sample include those with no fixed address (but not utilizing shelters), active-duty military, and residents in institutional living (correctional facilities, nursing homes, mental institutions, and long-term care facilities. (In the final project I will discuss the implications of excluding these segments of the population in the study.)

The table I have gathered data from:

  • Table 8.18A – Received Mental Health Services in Past Year: Among People Aged 18 or Older; by Past Year Level of Mental Illness and Geographic and Socioeconomic Characteristics, Numbers in Thousands, 2019 and 2020

This data includes the number of people (in thousands) who had received mental health services in the past year. The years included in the data are 2019 and 2020. The counts are itemized by those with any mental illness, serious mental illness, or no mental illness. The geographic and socioeconomic characteristics included in the data are geographic region, county type, poverty level, education level, and type of health insurance. This research defined mental health services to include inpatient treatment, outpatient treatment (ie. counseling), and/or use of prescription medication for issues related to emotions, nerves, or mental health.

Variables of Interest

Read in & Tidy Data

Read in data:

menthealth <- read_excel("~/DACSS601_R/Data/US_mentalhealth.xlsx", skip = 1)
## New names:
## * `Total (2019)` -> `Total (2019)...2`
## * `Total (2020)` -> `Total (2020)...3`
## * `Total (2019)` -> `Total (2019)...10`
## * `Total (2020)` -> `Total (2020)...11`
head(menthealth)
## # A tibble: 6 x 15
##   `Geographic/Socioecono~ `Total (2019)...~ `Total (2020)...~ `Any Mental Illne~
##   <chr>                               <dbl>             <dbl>              <dbl>
## 1 Total                               40154             42420              22953
## 2 Geographic Region                      NA                NA                 NA
## 3 Northeast                            7503              7533               4048
## 4 Midwest                              9740             10056               5682
## 5 South                               13503             15234               7650
## 6 West                                 9407              9597               5573
## # ... with 11 more variables: Any Mental Illness (2020) <dbl>,
## #   Serious Mental Illness (2019) <dbl>, Serious Mental Illness (2020) <chr>,
## #   No Mental Illness (2019) <dbl>, No Mental Illness (2020) <dbl>,
## #   Total (2019)...10 <dbl>, Total (2020)...11 <dbl>,
## #   Received Services (2019) <dbl>, Received Services (2020) <dbl>,
## #   Services Not Received (2019) <dbl>, Services Not Received (2020) <dbl>

Remove rows containing section labels geographic region, county type, poverty level, education level, and health insurance:

menthealth <- menthealth %>%
  filter(!str_detect(`Geographic/Socioeconomic Characteristics`, "Geographic Region"))%>%
  filter(!str_detect(`Geographic/Socioeconomic Characteristics`, "County Type"))%>%
  filter(!str_detect(`Geographic/Socioeconomic Characteristics`, "Poverty Level"))%>%
  filter(!str_detect(`Geographic/Socioeconomic Characteristics`, "Education Level"))%>%
  filter(!str_detect(`Geographic/Socioeconomic Characteristics`, "Health Insurance"))

Select columns that I want to keep:

colnames(menthealth)
##  [1] "Geographic/Socioeconomic Characteristics"
##  [2] "Total (2019)...2"                        
##  [3] "Total (2020)...3"                        
##  [4] "Any Mental Illness (2019)"               
##  [5] "Any Mental Illness (2020)"               
##  [6] "Serious Mental Illness (2019)"           
##  [7] "Serious Mental Illness (2020)"           
##  [8] "No Mental Illness (2019)"                
##  [9] "No Mental Illness (2020)"                
## [10] "Total (2019)...10"                       
## [11] "Total (2020)...11"                       
## [12] "Received Services (2019)"                
## [13] "Received Services (2020)"                
## [14] "Services Not Received (2019)"            
## [15] "Services Not Received (2020)"
recservices <- menthealth %>%
  select("Geographic/Socioeconomic Characteristics", "Total (2019)...2", "Any Mental Illness (2019)", "Serious Mental Illness (2019)", "No Mental Illness (2019)")%>%
  rename(Geographic_Socioeconomic_Characteristics="Geographic/Socioeconomic Characteristics", Total="Total (2019)...2", Any_Mental_Illness="Any Mental Illness (2019)", Serious_Mental_Illness="Serious Mental Illness (2019)", No_Mental_Illness="No Mental Illness (2019)")

Select values within Geographic/Socioeconomic Characteristics that I want to keep and rename them for clarity:

recservices <- recservices %>%
  filter(Geographic_Socioeconomic_Characteristics=="Northeast"|
           Geographic_Socioeconomic_Characteristics=="Midwest"|
           Geographic_Socioeconomic_Characteristics=="South"|
           Geographic_Socioeconomic_Characteristics=="West"|
           Geographic_Socioeconomic_Characteristics=="Less than 100%"|
           Geographic_Socioeconomic_Characteristics=="100% - 199%"|
           Geographic_Socioeconomic_Characteristics=="200% or More"|
           Geographic_Socioeconomic_Characteristics=="Private"|
           Geographic_Socioeconomic_Characteristics=="Medicaid/CHIP"|
           Geographic_Socioeconomic_Characteristics=="Other"|
           Geographic_Socioeconomic_Characteristics=="No Coverage")%>%
  mutate(Geographic_Socioeconomic_Characteristics = case_when(
    Geographic_Socioeconomic_Characteristics == "Northeast" ~ "Northeast Region",
    Geographic_Socioeconomic_Characteristics == "Midwest" ~ "Midwest Region",
    Geographic_Socioeconomic_Characteristics == "South" ~ "South Region",
    Geographic_Socioeconomic_Characteristics == "West" ~ "West Region",
    Geographic_Socioeconomic_Characteristics == "Less than 100%" ~ "Poverty Level less than 100%",
    Geographic_Socioeconomic_Characteristics == "100% - 199%" ~ "Poverty Level 100%-199%",
    Geographic_Socioeconomic_Characteristics == "200% or More" ~ "Poverty Level more than 200%",
    Geographic_Socioeconomic_Characteristics == "Private" ~ "Private Insurance",
    Geographic_Socioeconomic_Characteristics == "Medicaid/CHIP" ~ "Medicaid/CHIP Insurance",
    Geographic_Socioeconomic_Characteristics == "Other" ~ "Other Insurance",
    Geographic_Socioeconomic_Characteristics == "No Coverage" ~ "No Insurance Coverage"
  ))

Display tidied data in a table:

kable(recservices, col.names = c("Geographic/Socioeconomic Characteristics", "Total", "Any Mental Illness", "Serious Mental Illness", "No Mental Illness"), 
      align = c('c', 'c', 'c', 'c', 'c')) %>%
  add_header_above(c("Table 1: Characteristics of Individuals in the US who Received Mental Health Services in 2019 (in Thousands)"=5))%>%
    kable_styling(fixed_thead = TRUE)%>%
  scroll_box(width = "100%", height = "100%")
Table 1: Characteristics of Individuals in the US who Received Mental Health Services in 2019 (in Thousands)
Geographic/Socioeconomic Characteristics Total Any Mental Illness Serious Mental Illness No Mental Illness
Northeast Region 7503 4048 1406 3455
Midwest Region 9740 5682 2032 4058
South Region 13503 7650 2902 5853
West Region 9407 5573 2235 3835
Poverty Level less than 100% 5578 3860 1692 1718
Poverty Level 100%-199% 7403 4762 2000 2641
Poverty Level more than 200% 26957 14184 4847 12773
Private Insurance 26711 14061 4725 12650
Medicaid/CHIP Insurance 8131 5553 2540 2578
Other Insurance 10935 5633 2053 5301
No Insurance Coverage 2048 1419 665 629

The Geographic/Socioeconomic Characteristics column should be split up and have a column for each variable (poverty level, insurance type, and geographic region). However, since these variables are unrelated to each other, they cannot be made into different columns of the same table. I will make tables for each variable if needed throughout the research.

Variable 1: Geographic Region

I will make tables for each of the 3 geographic/socioeconomic variables I am interested in working with.

georegion <- recservices %>%
  filter(Geographic_Socioeconomic_Characteristics=="Northeast Region"|
         Geographic_Socioeconomic_Characteristics=="Midwest Region"|
         Geographic_Socioeconomic_Characteristics=="South Region"|
         Geographic_Socioeconomic_Characteristics=="West Region")%>%
  select(Geographic_Socioeconomic_Characteristics, Total, Any_Mental_Illness, Serious_Mental_Illness, No_Mental_Illness)%>%
  rename(Geographic_Region=Geographic_Socioeconomic_Characteristics)

georegion <- georegion[, c("Geographic_Region", "Any_Mental_Illness", "Serious_Mental_Illness", "No_Mental_Illness", "Total")]

kable(georegion, col.names = c("Geographic Region", "Any Mental Illness", "Serious Mental Illness", "No Mental Illness", "Total"), 
      align = c('c', 'c', 'c', 'c', 'c')) %>%
  add_header_above(c("Table 2: Individuals in the US who Received Mental Health Services in 2019 by Geographic Region (in Thousands)"=5))%>%
    kable_styling(fixed_thead = TRUE)%>%
  scroll_box(width = "100%", height = "100%")
Table 2: Individuals in the US who Received Mental Health Services in 2019 by Geographic Region (in Thousands)
Geographic Region Any Mental Illness Serious Mental Illness No Mental Illness Total
Northeast Region 4048 1406 3455 7503
Midwest Region 5682 2032 4058 9740
South Region 7650 2902 5853 13503
West Region 5573 2235 3835 9407
georegion1 <- georegion %>%
  pivot_longer(c(Any_Mental_Illness, Serious_Mental_Illness, No_Mental_Illness), names_to = "Mental_Illness_Type", values_to = "Number_Treated")%>%
  select(Geographic_Region, Mental_Illness_Type, Number_Treated)

ggplot(data = georegion1) +
  geom_bar(mapping = aes(x = Mental_Illness_Type, y = Number_Treated, fill = Geographic_Region), stat = "identity", position = "dodge") +
  labs(title = "Graph 1: Number of Individuals who Received Treatment in 2019", 
       subtitle = "By Geographic Region", x = "Type of Mental Illness", y = "Number of People Treated (in Thousands)")

Variable 2: Poverty Level

povertylvl <- recservices %>%
  filter(Geographic_Socioeconomic_Characteristics=="Poverty Level less than 100%"|
         Geographic_Socioeconomic_Characteristics=="Poverty Level 100%-199%"|
         Geographic_Socioeconomic_Characteristics=="Poverty Level more than 200%")%>%
  select(Geographic_Socioeconomic_Characteristics, Total, Any_Mental_Illness, Serious_Mental_Illness, No_Mental_Illness)%>%
  rename(Poverty_Level=Geographic_Socioeconomic_Characteristics)

povertylvl <- povertylvl[, c("Poverty_Level", "Any_Mental_Illness", "Serious_Mental_Illness", "No_Mental_Illness", "Total")]

kable(povertylvl, col.names = c("Poverty Level", "Any Mental Illness", "Serious Mental Illness", "No Mental Illness", "Total"), 
      align = c('c', 'c', 'c', 'c', 'c')) %>%
  add_header_above(c("Table 3: Individuals in the US who Received Mental Health Services in 2019 by Poverty Level (in Thousands)"=5))%>%
    kable_styling(fixed_thead = TRUE)%>%
  scroll_box(width = "100%", height = "100%")
Table 3: Individuals in the US who Received Mental Health Services in 2019 by Poverty Level (in Thousands)
Poverty Level Any Mental Illness Serious Mental Illness No Mental Illness Total
Poverty Level less than 100% 3860 1692 1718 5578
Poverty Level 100%-199% 4762 2000 2641 7403
Poverty Level more than 200% 14184 4847 12773 26957
povertylvl1 <- povertylvl %>%
  pivot_longer(c(Any_Mental_Illness, Serious_Mental_Illness, No_Mental_Illness), names_to = "Mental_Illness_Type", values_to = "Number_Treated")%>%
  select(Poverty_Level, Mental_Illness_Type, Number_Treated)

ggplot(data = povertylvl1) +
  geom_bar(mapping = aes(x = Mental_Illness_Type, y = Number_Treated, fill = Poverty_Level), stat = "identity", position = "dodge") +
  labs(title = "Graph 2: Number of Individuals who Received Treatment in 2019", 
       subtitle = "By Poverty Level", x = "Type of Mental Illness", y = "Number of People Treated (in Thousands)")

Variable 3: Type of Insurance

insurance <- recservices %>%
  filter(Geographic_Socioeconomic_Characteristics=="Private Insurance"|
         Geographic_Socioeconomic_Characteristics=="Medicaid/CHIP Insurance"|
         Geographic_Socioeconomic_Characteristics=="Other Insurance"|
         Geographic_Socioeconomic_Characteristics=="No Insurance Coverage")%>%
  select(Geographic_Socioeconomic_Characteristics, Total, Any_Mental_Illness, Serious_Mental_Illness, No_Mental_Illness)%>%
  rename(Insurance_Type=Geographic_Socioeconomic_Characteristics)

insurance <- insurance[, c("Insurance_Type", "Any_Mental_Illness", "Serious_Mental_Illness", "No_Mental_Illness", "Total")]

kable(insurance, col.names = c("Insurance Type", "Any Mental Illness", "Serious Mental Illness", "No Mental Illness", "Total"), 
      align = c('c', 'c', 'c', 'c', 'c')) %>%
  add_header_above(c("Table 4: Individuals in the US who Received Mental Health Services in 2019 by Insurance Type (in Thousands)"=5))%>%
    kable_styling(fixed_thead = TRUE)%>%
  scroll_box(width = "100%", height = "100%")
Table 4: Individuals in the US who Received Mental Health Services in 2019 by Insurance Type (in Thousands)
Insurance Type Any Mental Illness Serious Mental Illness No Mental Illness Total
Private Insurance 14061 4725 12650 26711
Medicaid/CHIP Insurance 5553 2540 2578 8131
Other Insurance 5633 2053 5301 10935
No Insurance Coverage 1419 665 629 2048
insurance1 <- insurance %>%
  pivot_longer(c(Any_Mental_Illness, Serious_Mental_Illness, No_Mental_Illness), names_to = "Mental_Illness_Type", values_to = "Number_Treated")%>%
  select(Insurance_Type, Mental_Illness_Type, Number_Treated)

ggplot(data = insurance1) +
  geom_bar(mapping = aes(x = Mental_Illness_Type, y = Number_Treated, fill = Insurance_Type), stat = "identity", position = "dodge") +
  labs(title = "Graph 3: Number of Individuals who Received Treatment in 2019", 
       subtitle = "By Insurance Type", x = "Type of Mental Illness", y = "Number of People Treated (in Thousands)")

Description of Relationship between Poverty Level and Receiving Treatment

After viewing Graph 2, I am interested in further exploring the relationship between poverty level and receiving mental health services. I will use the variables: total number of individuals who received treatment and poverty level.

povertytx <- povertylvl %>%
  select(Poverty_Level, Total)%>%
  pivot_wider(names_from = Poverty_Level, values_from = Total)

kable(povertytx, col.names = c("Less than 100%", "100% - 199%", "More than 200%"), 
      align = c('c', 'c', 'c')) %>%
  add_header_above(c("Table 5: Total Number of Individuals who Received Treatment by Poverty Level (in Thousands)"=3))%>%
    kable_styling(fixed_thead = TRUE)%>%
  scroll_box(width = "100%", height = "100%")
Table 5: Total Number of Individuals who Received Treatment by Poverty Level (in Thousands)
Less than 100% 100% - 199% More than 200%
5578 7403 26957

Question:

Is poverty level independent from receiving mental health treatment?

Hypothesis:

Chi Square Test of Independence:

chisq.test(povertytx)
## 
##  Chi-squared test for given probabilities
## 
## data:  povertytx
## X-squared = 21101, df = 2, p-value < 2.2e-16

Since the p-value is less than 0.05, the null hypothesis is rejected.

Disclaimers: I believe I used the incorrect statistical model to evaluate a potential relationship between the variables. I will continue to explore this in my final project. Also, I will need to look at the proportion of individuals below the poverty threshold and above it rather than just counts of those receiving treatment, as the lack of proportion may be skewing results.

Citations

  1. Center for Behavioral Health Statistics and Quality. (2021). Results from the 2020 National Survey on Drug Use and Health: Detailed tables. Rockville, MD: Substance Abuse and Mental Health Services Administration. Retrieved from https://www.samhsa.gov/data/

  2. Center for Behavioral Health Statistics and Quality. (2021). 2020 National Survey on Drug Use and Health (NSDUH): Methodological summary and definitions. Rockville, MD: Substance Abuse and Mental Health Services Administration. Retrieved from https://www.samhsa.gov/data/