For the past few assignments, I have been working with the Australian Marriage Law dataset provided in the class data folder. For homework 6, I have decided to branch out and work with a new dataset, which I will use for my final project.
The data I will be working with comes from the Substance Abuse and Mental Health Services Administration (SAMHSA), which is an agency within the US Department of Health and Human Services. The agency’s mission is to “reduce the impact of substance abuse and mental illness on America’s communities”. [https://www.samhsa.gov/]
The data comes from the National Survey on Drug Use and Health (NSDUH). The collection of detailed tables within the report contains estimates on the number of people with specified characteristics, like substance abuse or mental illness. Data is collected by field staff through the administration of in-person surveys, which moved remote in 2020 due to COVID-19 (will discuss in later section of final project potential implications of change in methodology due to pandemic). Sample design and sample size are sufficient to allow for state and national estimates. The sample includes United States non-institutionalized residents aged 12 or older, including residents of households, shelters, dormitories, and boarding houses. Individuals that were excluded from the sample include those with no fixed address (but not utilizing shelters), active-duty military, and residents in institutional living (correctional facilities, nursing homes, mental institutions, and long-term care facilities. (In the final project I will discuss the implications of excluding these segments of the population in the study.)
The table I have gathered data from:
This data includes the number of people (in thousands) who had received mental health services in the past year. The years included in the data are 2019 and 2020. The counts are itemized by those with any mental illness, serious mental illness, or no mental illness. The geographic and socioeconomic characteristics included in the data are geographic region, county type, poverty level, education level, and type of health insurance. This research defined mental health services to include inpatient treatment, outpatient treatment (ie. counseling), and/or use of prescription medication for issues related to emotions, nerves, or mental health.
Read in data:
menthealth <- read_excel("~/DACSS601_R/Data/US_mentalhealth.xlsx", skip = 1)
## New names:
## * `Total (2019)` -> `Total (2019)...2`
## * `Total (2020)` -> `Total (2020)...3`
## * `Total (2019)` -> `Total (2019)...10`
## * `Total (2020)` -> `Total (2020)...11`
head(menthealth)
## # A tibble: 6 x 15
## `Geographic/Socioecono~ `Total (2019)...~ `Total (2020)...~ `Any Mental Illne~
## <chr> <dbl> <dbl> <dbl>
## 1 Total 40154 42420 22953
## 2 Geographic Region NA NA NA
## 3 Northeast 7503 7533 4048
## 4 Midwest 9740 10056 5682
## 5 South 13503 15234 7650
## 6 West 9407 9597 5573
## # ... with 11 more variables: Any Mental Illness (2020) <dbl>,
## # Serious Mental Illness (2019) <dbl>, Serious Mental Illness (2020) <chr>,
## # No Mental Illness (2019) <dbl>, No Mental Illness (2020) <dbl>,
## # Total (2019)...10 <dbl>, Total (2020)...11 <dbl>,
## # Received Services (2019) <dbl>, Received Services (2020) <dbl>,
## # Services Not Received (2019) <dbl>, Services Not Received (2020) <dbl>
Remove rows containing section labels geographic region, county type, poverty level, education level, and health insurance:
menthealth <- menthealth %>%
filter(!str_detect(`Geographic/Socioeconomic Characteristics`, "Geographic Region"))%>%
filter(!str_detect(`Geographic/Socioeconomic Characteristics`, "County Type"))%>%
filter(!str_detect(`Geographic/Socioeconomic Characteristics`, "Poverty Level"))%>%
filter(!str_detect(`Geographic/Socioeconomic Characteristics`, "Education Level"))%>%
filter(!str_detect(`Geographic/Socioeconomic Characteristics`, "Health Insurance"))
Select columns that I want to keep:
colnames(menthealth)
## [1] "Geographic/Socioeconomic Characteristics"
## [2] "Total (2019)...2"
## [3] "Total (2020)...3"
## [4] "Any Mental Illness (2019)"
## [5] "Any Mental Illness (2020)"
## [6] "Serious Mental Illness (2019)"
## [7] "Serious Mental Illness (2020)"
## [8] "No Mental Illness (2019)"
## [9] "No Mental Illness (2020)"
## [10] "Total (2019)...10"
## [11] "Total (2020)...11"
## [12] "Received Services (2019)"
## [13] "Received Services (2020)"
## [14] "Services Not Received (2019)"
## [15] "Services Not Received (2020)"
recservices <- menthealth %>%
select("Geographic/Socioeconomic Characteristics", "Total (2019)...2", "Any Mental Illness (2019)", "Serious Mental Illness (2019)", "No Mental Illness (2019)")%>%
rename(Geographic_Socioeconomic_Characteristics="Geographic/Socioeconomic Characteristics", Total="Total (2019)...2", Any_Mental_Illness="Any Mental Illness (2019)", Serious_Mental_Illness="Serious Mental Illness (2019)", No_Mental_Illness="No Mental Illness (2019)")
Select values within Geographic/Socioeconomic Characteristics that I want to keep and rename them for clarity:
recservices <- recservices %>%
filter(Geographic_Socioeconomic_Characteristics=="Northeast"|
Geographic_Socioeconomic_Characteristics=="Midwest"|
Geographic_Socioeconomic_Characteristics=="South"|
Geographic_Socioeconomic_Characteristics=="West"|
Geographic_Socioeconomic_Characteristics=="Less than 100%"|
Geographic_Socioeconomic_Characteristics=="100% - 199%"|
Geographic_Socioeconomic_Characteristics=="200% or More"|
Geographic_Socioeconomic_Characteristics=="Private"|
Geographic_Socioeconomic_Characteristics=="Medicaid/CHIP"|
Geographic_Socioeconomic_Characteristics=="Other"|
Geographic_Socioeconomic_Characteristics=="No Coverage")%>%
mutate(Geographic_Socioeconomic_Characteristics = case_when(
Geographic_Socioeconomic_Characteristics == "Northeast" ~ "Northeast Region",
Geographic_Socioeconomic_Characteristics == "Midwest" ~ "Midwest Region",
Geographic_Socioeconomic_Characteristics == "South" ~ "South Region",
Geographic_Socioeconomic_Characteristics == "West" ~ "West Region",
Geographic_Socioeconomic_Characteristics == "Less than 100%" ~ "Poverty Level less than 100%",
Geographic_Socioeconomic_Characteristics == "100% - 199%" ~ "Poverty Level 100%-199%",
Geographic_Socioeconomic_Characteristics == "200% or More" ~ "Poverty Level more than 200%",
Geographic_Socioeconomic_Characteristics == "Private" ~ "Private Insurance",
Geographic_Socioeconomic_Characteristics == "Medicaid/CHIP" ~ "Medicaid/CHIP Insurance",
Geographic_Socioeconomic_Characteristics == "Other" ~ "Other Insurance",
Geographic_Socioeconomic_Characteristics == "No Coverage" ~ "No Insurance Coverage"
))
Display tidied data in a table:
kable(recservices, col.names = c("Geographic/Socioeconomic Characteristics", "Total", "Any Mental Illness", "Serious Mental Illness", "No Mental Illness"),
align = c('c', 'c', 'c', 'c', 'c')) %>%
add_header_above(c("Table 1: Characteristics of Individuals in the US who Received Mental Health Services in 2019 (in Thousands)"=5))%>%
kable_styling(fixed_thead = TRUE)%>%
scroll_box(width = "100%", height = "100%")
| Geographic/Socioeconomic Characteristics | Total | Any Mental Illness | Serious Mental Illness | No Mental Illness |
|---|---|---|---|---|
| Northeast Region | 7503 | 4048 | 1406 | 3455 |
| Midwest Region | 9740 | 5682 | 2032 | 4058 |
| South Region | 13503 | 7650 | 2902 | 5853 |
| West Region | 9407 | 5573 | 2235 | 3835 |
| Poverty Level less than 100% | 5578 | 3860 | 1692 | 1718 |
| Poverty Level 100%-199% | 7403 | 4762 | 2000 | 2641 |
| Poverty Level more than 200% | 26957 | 14184 | 4847 | 12773 |
| Private Insurance | 26711 | 14061 | 4725 | 12650 |
| Medicaid/CHIP Insurance | 8131 | 5553 | 2540 | 2578 |
| Other Insurance | 10935 | 5633 | 2053 | 5301 |
| No Insurance Coverage | 2048 | 1419 | 665 | 629 |
The Geographic/Socioeconomic Characteristics column should be split up and have a column for each variable (poverty level, insurance type, and geographic region). However, since these variables are unrelated to each other, they cannot be made into different columns of the same table. I will make tables for each variable if needed throughout the research.
I will make tables for each of the 3 geographic/socioeconomic variables I am interested in working with.
georegion <- recservices %>%
filter(Geographic_Socioeconomic_Characteristics=="Northeast Region"|
Geographic_Socioeconomic_Characteristics=="Midwest Region"|
Geographic_Socioeconomic_Characteristics=="South Region"|
Geographic_Socioeconomic_Characteristics=="West Region")%>%
select(Geographic_Socioeconomic_Characteristics, Total, Any_Mental_Illness, Serious_Mental_Illness, No_Mental_Illness)%>%
rename(Geographic_Region=Geographic_Socioeconomic_Characteristics)
georegion <- georegion[, c("Geographic_Region", "Any_Mental_Illness", "Serious_Mental_Illness", "No_Mental_Illness", "Total")]
kable(georegion, col.names = c("Geographic Region", "Any Mental Illness", "Serious Mental Illness", "No Mental Illness", "Total"),
align = c('c', 'c', 'c', 'c', 'c')) %>%
add_header_above(c("Table 2: Individuals in the US who Received Mental Health Services in 2019 by Geographic Region (in Thousands)"=5))%>%
kable_styling(fixed_thead = TRUE)%>%
scroll_box(width = "100%", height = "100%")
| Geographic Region | Any Mental Illness | Serious Mental Illness | No Mental Illness | Total |
|---|---|---|---|---|
| Northeast Region | 4048 | 1406 | 3455 | 7503 |
| Midwest Region | 5682 | 2032 | 4058 | 9740 |
| South Region | 7650 | 2902 | 5853 | 13503 |
| West Region | 5573 | 2235 | 3835 | 9407 |
georegion1 <- georegion %>%
pivot_longer(c(Any_Mental_Illness, Serious_Mental_Illness, No_Mental_Illness), names_to = "Mental_Illness_Type", values_to = "Number_Treated")%>%
select(Geographic_Region, Mental_Illness_Type, Number_Treated)
ggplot(data = georegion1) +
geom_bar(mapping = aes(x = Mental_Illness_Type, y = Number_Treated, fill = Geographic_Region), stat = "identity", position = "dodge") +
labs(title = "Graph 1: Number of Individuals who Received Treatment in 2019",
subtitle = "By Geographic Region", x = "Type of Mental Illness", y = "Number of People Treated (in Thousands)")
povertylvl <- recservices %>%
filter(Geographic_Socioeconomic_Characteristics=="Poverty Level less than 100%"|
Geographic_Socioeconomic_Characteristics=="Poverty Level 100%-199%"|
Geographic_Socioeconomic_Characteristics=="Poverty Level more than 200%")%>%
select(Geographic_Socioeconomic_Characteristics, Total, Any_Mental_Illness, Serious_Mental_Illness, No_Mental_Illness)%>%
rename(Poverty_Level=Geographic_Socioeconomic_Characteristics)
povertylvl <- povertylvl[, c("Poverty_Level", "Any_Mental_Illness", "Serious_Mental_Illness", "No_Mental_Illness", "Total")]
kable(povertylvl, col.names = c("Poverty Level", "Any Mental Illness", "Serious Mental Illness", "No Mental Illness", "Total"),
align = c('c', 'c', 'c', 'c', 'c')) %>%
add_header_above(c("Table 3: Individuals in the US who Received Mental Health Services in 2019 by Poverty Level (in Thousands)"=5))%>%
kable_styling(fixed_thead = TRUE)%>%
scroll_box(width = "100%", height = "100%")
| Poverty Level | Any Mental Illness | Serious Mental Illness | No Mental Illness | Total |
|---|---|---|---|---|
| Poverty Level less than 100% | 3860 | 1692 | 1718 | 5578 |
| Poverty Level 100%-199% | 4762 | 2000 | 2641 | 7403 |
| Poverty Level more than 200% | 14184 | 4847 | 12773 | 26957 |
povertylvl1 <- povertylvl %>%
pivot_longer(c(Any_Mental_Illness, Serious_Mental_Illness, No_Mental_Illness), names_to = "Mental_Illness_Type", values_to = "Number_Treated")%>%
select(Poverty_Level, Mental_Illness_Type, Number_Treated)
ggplot(data = povertylvl1) +
geom_bar(mapping = aes(x = Mental_Illness_Type, y = Number_Treated, fill = Poverty_Level), stat = "identity", position = "dodge") +
labs(title = "Graph 2: Number of Individuals who Received Treatment in 2019",
subtitle = "By Poverty Level", x = "Type of Mental Illness", y = "Number of People Treated (in Thousands)")
insurance <- recservices %>%
filter(Geographic_Socioeconomic_Characteristics=="Private Insurance"|
Geographic_Socioeconomic_Characteristics=="Medicaid/CHIP Insurance"|
Geographic_Socioeconomic_Characteristics=="Other Insurance"|
Geographic_Socioeconomic_Characteristics=="No Insurance Coverage")%>%
select(Geographic_Socioeconomic_Characteristics, Total, Any_Mental_Illness, Serious_Mental_Illness, No_Mental_Illness)%>%
rename(Insurance_Type=Geographic_Socioeconomic_Characteristics)
insurance <- insurance[, c("Insurance_Type", "Any_Mental_Illness", "Serious_Mental_Illness", "No_Mental_Illness", "Total")]
kable(insurance, col.names = c("Insurance Type", "Any Mental Illness", "Serious Mental Illness", "No Mental Illness", "Total"),
align = c('c', 'c', 'c', 'c', 'c')) %>%
add_header_above(c("Table 4: Individuals in the US who Received Mental Health Services in 2019 by Insurance Type (in Thousands)"=5))%>%
kable_styling(fixed_thead = TRUE)%>%
scroll_box(width = "100%", height = "100%")
| Insurance Type | Any Mental Illness | Serious Mental Illness | No Mental Illness | Total |
|---|---|---|---|---|
| Private Insurance | 14061 | 4725 | 12650 | 26711 |
| Medicaid/CHIP Insurance | 5553 | 2540 | 2578 | 8131 |
| Other Insurance | 5633 | 2053 | 5301 | 10935 |
| No Insurance Coverage | 1419 | 665 | 629 | 2048 |
insurance1 <- insurance %>%
pivot_longer(c(Any_Mental_Illness, Serious_Mental_Illness, No_Mental_Illness), names_to = "Mental_Illness_Type", values_to = "Number_Treated")%>%
select(Insurance_Type, Mental_Illness_Type, Number_Treated)
ggplot(data = insurance1) +
geom_bar(mapping = aes(x = Mental_Illness_Type, y = Number_Treated, fill = Insurance_Type), stat = "identity", position = "dodge") +
labs(title = "Graph 3: Number of Individuals who Received Treatment in 2019",
subtitle = "By Insurance Type", x = "Type of Mental Illness", y = "Number of People Treated (in Thousands)")
After viewing Graph 2, I am interested in further exploring the relationship between poverty level and receiving mental health services. I will use the variables: total number of individuals who received treatment and poverty level.
povertytx <- povertylvl %>%
select(Poverty_Level, Total)%>%
pivot_wider(names_from = Poverty_Level, values_from = Total)
kable(povertytx, col.names = c("Less than 100%", "100% - 199%", "More than 200%"),
align = c('c', 'c', 'c')) %>%
add_header_above(c("Table 5: Total Number of Individuals who Received Treatment by Poverty Level (in Thousands)"=3))%>%
kable_styling(fixed_thead = TRUE)%>%
scroll_box(width = "100%", height = "100%")
| Less than 100% | 100% - 199% | More than 200% |
|---|---|---|
| 5578 | 7403 | 26957 |
Question:
Is poverty level independent from receiving mental health treatment?
Hypothesis:
Chi Square Test of Independence:
chisq.test(povertytx)
##
## Chi-squared test for given probabilities
##
## data: povertytx
## X-squared = 21101, df = 2, p-value < 2.2e-16
Since the p-value is less than 0.05, the null hypothesis is rejected.
Disclaimers: I believe I used the incorrect statistical model to evaluate a potential relationship between the variables. I will continue to explore this in my final project. Also, I will need to look at the proportion of individuals below the poverty threshold and above it rather than just counts of those receiving treatment, as the lack of proportion may be skewing results.
Center for Behavioral Health Statistics and Quality. (2021). Results from the 2020 National Survey on Drug Use and Health: Detailed tables. Rockville, MD: Substance Abuse and Mental Health Services Administration. Retrieved from https://www.samhsa.gov/data/
Center for Behavioral Health Statistics and Quality. (2021). 2020 National Survey on Drug Use and Health (NSDUH): Methodological summary and definitions. Rockville, MD: Substance Abuse and Mental Health Services Administration. Retrieved from https://www.samhsa.gov/data/