Effectiveness of the VHA in Preventing Veteran Suicide

Author

Divas: Aiyana Brown, Alyssa Brooks, Tamiya Pilgrim, Bailey Spyker

QR Code

#install.packages('qrcode')
library(qrcode)

qr <- qr_code("https://rpubs.com/tamiyap/1426250")
plot(qr)

Introduction

  • Veteran Suicide is a major public health concern in the United States (U.S), with consistently higher suicide rates among veterans compared to those who are not Veterans.

  • The purpose of this analysis is to find the possible relationships between those who use the Veterans Health Administration (VHA) and those who do not. As well as to better understand patterns and differences in suicide rates across different groups of Veterans.

  • Many Veterans face challenges such as mental health disorders, post traumatic stress, facing extreme transition into civilian life, and often limited support systems. All of these including many other factors can result in a heightened risk of suicide.

Project Goal

Our goal is to identify trends that lead to a higher risk of suicide and finding additional ways we can decrease the overall suicide risk among Veterans post service or during service.

Data

We obtained the Veteran data set through Kaggle, the original data set consisted of multiple tables that were differentiated by classification. We decided to combine the tables into two different data sets prior to uploading to R. The first data set that we created contains information about veterans and civilians then our second data set contains information about veterans that used their VHA benefits within 2 years leading up to their death, veteran that didn’t use VHA benefits and civilians.

library(knitr)
library(tidyverse)

data.frame(Variable_Names = names(Vet_vs_Civilian)) %>%
  knitr::kable(
    caption = "Variable Names in Veteran vs Civilian Dataset"
  )
Variable Names in Veteran vs Civilian Dataset
Variable_Names
Classification
Year_of_Death
Age_Group
Suicide_Deaths
Population_Estimate
Unadjusted_rate
Male_deaths
Male_pop
Male_unadjusted
Female_age
Female_suicide
Female_pop
Female_unadjusted
library(knitr)
library(tidyverse)

data.frame(Variable_Names = names(Va_vs_NoVa_vs_Civilian)) %>%
  knitr::kable(
    caption = "Variable Names in Veterans who use the VA and Veterans who do not Dataset"
  )
Variable Names in Veterans who use the VA and Veterans who do not Dataset
Variable_Names
Classification
Year_of_Death
Age_Group
Suicide_Deaths
Population_Estimate
Unadjusted_rate

There are multiple variables that are related to Veteran Suicide. We did use all of them for different purposes as the original data set was cleaned and then only the necessary variables were kept.

You can interact with the data using the search box, such as limiting results to a specific Year of Death or a chosen age group.

library(DT)
datatable(Vet_vs_Civilian)

You can do the same here by interacting with the data using the search box, such as limiting results to whether or not a Veteran utilized the VA or a specific Year of Death.

library(DT)
datatable(Va_vs_NoVa_vs_Civilian)

Analysis

This project analyzes the data across different levels

  • Unadjusted suicide rate

  • Relationship between unadjusted suicide rate over time while highlighting different variables

  • Unadjusted suicide rate of male and females veterans and nonveterans

  • Unadjusted suicide rate over time and by classification

Target Variable Analysis

To better understand the suicide rate variables, the data set was summarized by reporting the total number of records and variables, along with the average, maximum, and median values male, female, and overall unadjusted suicide rates. In addition, the distributions of these variables were examined using histograms. The results show that male suicide rates are consistently higher than female rates, while overall unadjusted rates between the two. This indicates a clear gender disparity, with males experiencing higher suicide rates on average.

library(tidyverse)
library(knitr)

sheet1 <- Vet_vs_Civilian
sheet2 <- Va_vs_NoVa_vs_Civilian

##Clean Data 
sheet1_clean <- sheet1 %>%
  filter(Male_unadjusted != ".", Female_unadjusted != ".") %>%
  mutate(
    Male_unadjusted = as.numeric(Male_unadjusted),
    Female_unadjusted = as.numeric(Female_unadjusted)
  )

sheet2_clean <- sheet2 %>%
  filter(Unadjusted_rate != ".") %>%
  mutate(
    Unadjusted_rate = as.numeric(Unadjusted_rate)
  )

##Summary Table: Male Unadjusted

Row1 <- sheet1_clean %>%
  summarise(
    Records = n(),
    Variables = ncol(.),
    Avg = mean(Male_unadjusted),
    Min = min(Male_unadjusted),
    Max = max(Male_unadjusted),
    Med = median(Male_unadjusted)
  )

#Summary Table: Female Unadjusted 

Row2 <- sheet1_clean %>%
  summarise(
    Records = n(),
    Variables = ncol(.),
    Avg = mean(Female_unadjusted),
    Min = min(Female_unadjusted),
    Max = max(Female_unadjusted),
    Med = median(Female_unadjusted)
  )

#Combine Tables

bind_data_rows <- data.frame(rbind(Row1, Row2))

rownames(bind_data_rows) <- c("Male Unadjusted", "Female Unadjusted")

bind_data_rows %>%
  knitr::kable(
    caption = "Summary of Male vs Female",
    digits = 2
  )
Summary of Male vs Female
Records Variables Avg Min Max Med
Male Unadjusted 184 13 28.74 17.73 54.75 27.31
Female Unadjusted 184 13 9.60 3.59 24.57 8.20
#Summary Table: Unadjusted Rate 

Row3 <- sheet2_clean %>%
  summarise(
    Records = n(),
    Variables = ncol(.),
    Avg = mean(Unadjusted_rate),
    Min = min(Unadjusted_rate),
    Max = max(Unadjusted_rate),
    Med = median(Unadjusted_rate)
  )

table2 <- data.frame(Row3)

rownames(table2) <- c("Unadjusted Rate")

table2 %>%
  knitr::kable(
    caption = "Summary of Unadjusted Rate",
    digits = 2
  )
Summary of Unadjusted Rate
Records Variables Avg Min Max Med
Unadjusted Rate 345 6 27.11 10.49 58.77 26.59
#Histogram Distributions

library(plotly)
library(ggplot2)

#Veteran vs Non-Veteran 

plot_ly(
  data = sheet1,
  x = ~Male_unadjusted,
  color = ~Classification,
  type = "histogram",
  nbinsx = 10
) %>%
  layout(
    title = "Histogram of Male Unadjusted Suicide Rate ",
    xaxis = list(title = "Male Unadjusted Values"),
    yaxis = list(title = "Frequency")
  )
sheet1 <- sheet1 %>%
  mutate(
    Female_unadjusted = na_if(Female_unadjusted, "."),
    Female_unadjusted = as.numeric(Female_unadjusted)
  )

plot_ly(
  data = sheet1,
  x = ~Female_unadjusted,
  color = ~Classification,
  type = "histogram",
  nbinsx = 10
) %>%
  layout(
    title = "Histogram of Female Unadjusted Suicide Rate",
    xaxis = list(title = "Female Unadjusted Values"),
    yaxis = list(title = "Frequency")
  )
plot_ly(
  data = sheet2,
  x = ~Unadjusted_rate,
  color = ~Classification,
  type = "histogram",
  nbinsx = 10
) %>%
  layout(
    title = "Histogram of Unadjusted Suicide Rate",
    xaxis = list(title = "Unadjusted Rate"),
    yaxis = list(title = "Frequency")
  )

Two-Dimension Analysis

This analysis examines the relationship between suicide rates and key factors including classification, age group, and time. Results show that veterans have consistently higher suicide rates than civilians across most age groups. Suicide rates also vary by age and change over time, indicating that both demographic and temporal factors influence these patterns.

ggplot(Vet_vs_Civilian, aes(x = Classification, y = Unadjusted_rate)) +
  geom_boxplot(fill = "lightblue") +
  labs(
    title = "Suicide Rates: Veterans vs Civilians",
    x = "Group",
    y = "Suicide Rate per 100,000"
  ) +
  theme_minimal()

ggplot(Vet_vs_Civilian, aes(x = Age_Group, y = Unadjusted_rate)) +
  geom_boxplot(fill = "lightblue") +
  labs(
    title = "Suicide Rates by Age Group",
    x = "Age Group",
    y = "Suicide Rate"
  ) +
  theme_minimal()

ggplot(Vet_vs_Civilian, aes(x = Year_of_Death, y = Unadjusted_rate)) +
  geom_point() +
  geom_smooth(color = "lightblue", se = FALSE) +
  labs(
    title = "Suicide Rates Over Time",
    x = "Year",
    y = "Suicide Rate"
  ) +
  theme_minimal()

ggplot(Vet_vs_Civilian, aes(x = Age_Group, y = Unadjusted_rate, fill = Classification)) +
  geom_boxplot() +
  labs(
    title = "Suicide Rates by Age Group and Classification",
    x = "Age Group",
    y = "Suicide Rate"
  ) +
  theme_minimal()

ggplot(Va_vs_NoVa_vs_Civilian, aes(x = Classification, y = Unadjusted_rate)) +
  geom_boxplot(fill = "lightblue") +
  labs(
    title = "Suicide Rates by VA Usage",
    x = "Group",
    y = "Suicide Rate"
  ) +
  theme_minimal()

Three-Dimension Analysis

The three-dimensional analysis helps us between understand how the unadjusted suicide changes over time when considering classification is considered. As well as looking at gender based unadjusted suicide rate over time # and how that looks when considering classification.

  • VHA User: Overall veterans that used their VA benefits within the last two years of their life have higher unadjusted suicide rate overall. There was a small dip in rates between 2002 and 2006 before going on a steady incline

  • Non-VHA User: Steady incline from 2001 until 2023. Rates lie between VHA Users and Civilians

  • Civilian: Civilians have the lowest unadjusted suicide rate throughout the years. Its also the only classification that has minimal incline

ggplot(Va_vs_NoVa_vs_Civilian |> filter(Age_Group != "All"), 
       aes(x = Year_of_Death , y = Unadjusted_rate, color = Classification)) + 
  geom_point() + 
  geom_smooth(se = FALSE) +
  scale_y_continuous(name = "Unadjusted Suicide Rate (per 100,000 people)") + 
  scale_x_continuous(name = "Year") + 
  labs(
    title = "Scatterplot of Unadjusted Suicide Rates Over the Years"
  )+ 
  theme(legend.position = "bottom")

Overall males have higher unadjusted suicide rates than women regardless of classification.

  • Female: In the early 2000’s the appears to be a much smaller gap between civilians and veteran. Between 2008 and 2023 the gap significantly grows at a fast rate.

  • Male: The gap between civilians and veteran is large but consistent/stable throughout the years.

library(tidyverse)
library(ggplot2)
library(ggridges)
library(scales)
library(ggrepel)
library(patchwork)

gender <- Vet_vs_Civilian |> 
  mutate(Female_suicide = as.numeric(Female_suicide)) |> 
  filter(!is.na(Female_suicide)) |> 
  mutate(male_rate = ((Male_deaths / Population_Estimate)*100000)) |> 
  mutate(fem_rate = ((Female_suicide / Population_Estimate)*100000))
  
female1 <- ggplot(gender|> filter(Age_Group != "All") |>
                    mutate(fem_rate = as.numeric(Female_unadjusted)) , 
                  aes(x = Year_of_Death , y = fem_rate, color = Classification)) + 
  geom_point()+ 
  geom_smooth(se = FALSE) +
  scale_y_continuous(name = "Female Unadjusted Suicide Rate (per 100,000 people)") + 
  scale_x_continuous(name = "Year") + 
  labs(
    title = "Scatterplots of Unadjusted Suicide Rates Over the Years Seperated By Gender"
  )+ 
  theme(legend.position = "none") 


male1 <- ggplot(gender|> filter(Age_Group != "All") , 
                aes(x = Year_of_Death , y = male_rate, color = Classification)) + 
  geom_point() +
  geom_smooth(se = FALSE) + 
  scale_y_continuous(name = "Male Unadjusted Suicide Rate (per 100,000 people)") + 
  scale_x_continuous(name = "Year") + 
  theme(legend.position = "bottom")

female1 + male1

Outlier Analysis

This analysis examines the classification categories of outliers based on male and female suicide rates for veterans and civilians, total suicide rates for VHA vs Non VHA users, as well as age group. The first step in identifying outliers is determining threshold values. The below boxplots suggest cutoffs of 21 for female suicide rates and 45 for male suicide rates. However, the total unadjusted suicide rate for vha vs non vha users has no outliers. Age group distirbution for both datasets is represented with a barchart because it’s a categorical variable. These barcharts show no outliers for those variables, as well.

vet_vs_nonvet <- Vet_vs_Civilian
va_vs_nova <- Va_vs_NoVa_vs_Civilian

library(ggthemes)
library(tidyverse)
library(patchwork)
library(ggrepel)

vet_vs_nonvet$Female_unadjusted = as.numeric(vet_vs_nonvet$Female_unadjusted, na.rm = TRUE)
vet_vs_nonvet$Male_unadjusted = as.numeric(vet_vs_nonvet$Male_unadjusted, na.rm = TRUE)
vet_vs_nonvet$Year_of_Death = as.numeric(vet_vs_nonvet$Year_of_Death, na.rm = TRUE)
va_vs_nova$Unadjusted_rate = as.numeric(va_vs_nova$Unadjusted_rate, na.rm = TRUE)

fem_plot = ggplot(vet_vs_nonvet, aes(x = Female_unadjusted)) +
  geom_boxplot(fill = "lightblue", na.rm = TRUE) +
  labs(
    title = "Female Unadjusted Suicide Rates (Veterans vs Civilians)",
    x = "Suicide Rate per 100,000"
  )

fem_plot

male_plot = ggplot(vet_vs_nonvet, aes(x = Male_unadjusted)) +
  geom_boxplot(fill = "lightblue") +
  labs(
    title = "Male Unadjusted Suicide Rates (Veterans vs Civilians)",
    x = "Suicide Rate per 100,000"
  )
  
male_plot

total_plot = ggplot(va_vs_nova, aes(x = Unadjusted_rate)) +
  geom_boxplot(fill = "lightblue") +
  labs(
    title = "Unadjusted Suicide Rates (VHA Users, Non-VHA USers, Civilians)",
    x = "Suicide Rate per 100,000"
  )
total_plot

age_group_vaciv_plot = ggplot(vet_vs_nonvet, aes(x = Age_Group))+
  geom_bar(fill = "lightblue") +
  labs(
    title = "Age Group Counts of Veteran and Civilians",
    x = "Age Group",
    y = "Count"
  )

age_group_vaciv_plot

age_group_vha_plot = ggplot(va_vs_nova, aes(x = Age_Group))+
  geom_bar(fill = "lightblue") +
  labs(
    title = "Age Group Counts of VHA Users, Non-VHA Users, and Civilians",
    x = "Age Group",
    y = "Count"
  )
age_group_vha_plot

The final analysis highlights outliers across different veteran classifications and VHA usage. The plots show that the outlier for the female unadjusted rate is a veteran between the ages of 18 and 24. For the male unadjusted rate, the majority of the outliers are civilians 75 and up. However, starting around 2016, the highest outliers are veterans between the ages of 18 to 34. The scatterplot for total suicide rate by VHA usage shows that there are no outliers for this variable.

fem_outliers = vet_vs_nonvet |>
  filter(Female_unadjusted > 21)

male_outliers = vet_vs_nonvet |>
  filter(Male_unadjusted > 45)

total_outliers= va_vs_nova |>
  filter(Unadjusted_rate >59)




fem_scatter = ggplot(vet_vs_nonvet, aes(x = Year_of_Death, y = Female_unadjusted))+
  geom_point(color = "black",na.rm = TRUE, size = 2) +
  labs(
    title = "Female Suicide Rate Outliers by Veteran Classification",
    x = "Year",
    y = "Female Unadjusted Suicide Rate per 100,000"
  ) +
  geom_point(data = fem_outliers, aes(color = Classification, shape = Age_Group, size = 1.5)) +

  theme_minimal() +
  theme(legend.position = "bottom")

  
fem_scatter 

male_scatter = ggplot(vet_vs_nonvet, aes(x = Year_of_Death, y = Male_unadjusted))+
  geom_point(color = "black",na.rm = TRUE, size = 2) +
  labs(
    title = "Male Suicide Rate Outliers by Veteran Classification",
    y = "Male Unadjusted Suicide Rate per 100,000",
    x = "Year"
  ) +
  geom_point(data = male_outliers, aes(color = Classification, shape = Age_Group, size = 1.5)) +
  scale_y_continuous(limits = c(0, 70))+
  theme_minimal() +
  theme(legend.position = "bottom")

male_scatter

total_scatter = ggplot(va_vs_nova, aes(x = Year_of_Death, y = Unadjusted_rate))+
  geom_point(color = "black",na.rm = TRUE) +
  labs(
    title = "Total Suicide Rate Outliers by VHA Usage",
    x = "Year",
    y = "Unadjusted Suicide Rate per 100,000"
  ) +
  geom_point(data = total_outliers, aes(color = Classification, shape = Age_Group, size = 1.5)) +

  theme_minimal() +
  theme(legend.position = "bottom")

total_scatter

Conclusion

  1. Veteran suicide rates are higher than civilian rates across all comparisons

  2. Male suicide rates are higher than female rates

  3. Suicide rates increase over time, especially after 2010

  4. Veterans show a stronger upward trend than civilians

  5. Suicide rates vary by age group

  6. Middle-aged and older groups tend to have higher rates

  7. Veteran rates are higher than civilian rates within every age group

  8. VHA users have the highest suicide rates

  9. Non-VHA users fall between VHA users and civilians

  10. Outliers are more common and more extreme among veterans

  11. Male veteran outliers reach the highest values overall

  12. Younger (18 - 34) and older (75+) groups show notable outliers

  13. Age group counts are evenly distributed, so results are not due to sample imbalance

Contact Information

Thanks for visiting our page!

  • Aiyana Brown

    • mbrow623@students.kennesaw.edu
  • Alyssa Brooks

    • abroo163@students.kennesaw.edu
  • Tamiya Pilgrim

    • tpilgri6@students.kennesaw.edu
  • Bailey Spyker

    • bspyker@students.kennesaw.edu

Location

#Interactive map 
#install.packages("leaflet")
library(leaflet)

leaflet() %>%
  addTiles() %>% 
  addMarkers(
    lng = -84.52,
    lat = 33.94,
    popup = "Kennesaw State University - Marrieta Campus"
  )