library(tidyverse)
## Warning: package 'ggplot2' was built under R version 4.4.1
## Warning: package 'dplyr' was built under R version 4.4.1
## ── Attaching core tidyverse packages ──────────────────────── tidyverse 2.0.0 ──
## ✔ dplyr 1.1.4 ✔ readr 2.1.5
## ✔ forcats 1.0.0 ✔ stringr 1.5.1
## ✔ ggplot2 3.5.1 ✔ tibble 3.2.1
## ✔ lubridate 1.9.3 ✔ tidyr 1.3.1
## ✔ purrr 1.0.2
## ── Conflicts ────────────────────────────────────────── tidyverse_conflicts() ──
## ✖ dplyr::filter() masks stats::filter()
## ✖ dplyr::lag() masks stats::lag()
## ℹ Use the conflicted package (<http://conflicted.r-lib.org/>) to force all conflicts to become errors
library(ggplot2)
suicide <- read.csv("suicide.csv")
Which demographics and groups of people in the United States commit suicide at the highest rates? This dataset from the U.S. Department of Health & Human Services includes information on several groups of people in the United States and their reported suicide rate. Some variables include sex, age and race, which we will explore. The dataset also includes the year of each suicide rate which enables us to understand trends over time. All of these details are critical to being able to conclude the kinds of people most at risk of ending their own life. We will then be able to tell which groups require the most help, as the goal is to decrease the frequency of suicides.
suicide_male <- suicide %>%
filter(STUB_LABEL == "Male") %>%
filter(UNIT == "Deaths per 100,000 resident population, age-adjusted") %>%
arrange(-ESTIMATE)
head(suicide_male, 5)
## INDICATOR UNIT
## 1 Death rates for suicide Deaths per 100,000 resident population, age-adjusted
## 2 Death rates for suicide Deaths per 100,000 resident population, age-adjusted
## 3 Death rates for suicide Deaths per 100,000 resident population, age-adjusted
## 4 Death rates for suicide Deaths per 100,000 resident population, age-adjusted
## 5 Death rates for suicide Deaths per 100,000 resident population, age-adjusted
## UNIT_NUM STUB_NAME STUB_NAME_NUM STUB_LABEL STUB_LABEL_NUM YEAR YEAR_NUM
## 1 1 Sex 2 Male 2.1 2018 42
## 2 1 Sex 2 Male 2.1 2017 41
## 3 1 Sex 2 Male 2.1 1986 10
## 4 1 Sex 2 Male 2.1 1987 11
## 5 1 Sex 2 Male 2.1 1990 14
## AGE AGE_NUM ESTIMATE FLAG
## 1 All ages 0 22.8
## 2 All ages 0 22.4
## 3 All ages 0 21.9
## 4 All ages 0 21.7
## 5 All ages 0 21.5
mean(suicide_male$ESTIMATE)
## [1] 20.04762
suicide_male %>%
filter(YEAR %in% c("2018", "2017")) %>%
summarize(mean(ESTIMATE))
## mean(ESTIMATE)
## 1 22.6
The top two years with the highest suicide rate among males are 2017 and 2018, with an average of 22.6 suicides per 100,000 people, age-adjusted. This is higher than the mean of the male suicide rate in the US from 1950 to 2018 of 20.05 years.
suicide_female <- suicide %>%
filter(STUB_LABEL == "Female") %>%
filter(UNIT == "Deaths per 100,000 resident population, age-adjusted") %>%
arrange(-ESTIMATE)
head(suicide_female, 5)
## INDICATOR UNIT
## 1 Death rates for suicide Deaths per 100,000 resident population, age-adjusted
## 2 Death rates for suicide Deaths per 100,000 resident population, age-adjusted
## 3 Death rates for suicide Deaths per 100,000 resident population, age-adjusted
## 4 Death rates for suicide Deaths per 100,000 resident population, age-adjusted
## 5 Death rates for suicide Deaths per 100,000 resident population, age-adjusted
## UNIT_NUM STUB_NAME STUB_NAME_NUM STUB_LABEL STUB_LABEL_NUM YEAR YEAR_NUM
## 1 1 Sex 2 Female 2.2 1970 3
## 2 1 Sex 2 Female 2.2 2018 42
## 3 1 Sex 2 Female 2.2 2017 41
## 4 1 Sex 2 Female 2.2 1981 5
## 5 1 Sex 2 Female 2.2 2015 39
## AGE AGE_NUM ESTIMATE FLAG
## 1 All ages 0 7.4
## 2 All ages 0 6.2
## 3 All ages 0 6.1
## 4 All ages 0 6.0
## 5 All ages 0 6.0
mean(suicide_female$ESTIMATE)
## [1] 5.069048
The suicide rate for females is 5.07, more than one fourth less than the male suicide rate of 22.6.
suicide_sex <- suicide %>%
filter(STUB_LABEL %in% c("Male", "Female")) %>%
filter(UNIT == "Deaths per 100,000 resident population, age-adjusted") %>%
arrange(-ESTIMATE)
head(suicide_sex, 5)
## INDICATOR UNIT
## 1 Death rates for suicide Deaths per 100,000 resident population, age-adjusted
## 2 Death rates for suicide Deaths per 100,000 resident population, age-adjusted
## 3 Death rates for suicide Deaths per 100,000 resident population, age-adjusted
## 4 Death rates for suicide Deaths per 100,000 resident population, age-adjusted
## 5 Death rates for suicide Deaths per 100,000 resident population, age-adjusted
## UNIT_NUM STUB_NAME STUB_NAME_NUM STUB_LABEL STUB_LABEL_NUM YEAR YEAR_NUM
## 1 1 Sex 2 Male 2.1 2018 42
## 2 1 Sex 2 Male 2.1 2017 41
## 3 1 Sex 2 Male 2.1 1986 10
## 4 1 Sex 2 Male 2.1 1987 11
## 5 1 Sex 2 Male 2.1 1990 14
## AGE AGE_NUM ESTIMATE FLAG
## 1 All ages 0 22.8
## 2 All ages 0 22.4
## 3 All ages 0 21.9
## 4 All ages 0 21.7
## 5 All ages 0 21.5
ggplot(data = suicide_sex) +
geom_point(aes(x = YEAR,
y = ESTIMATE,
colour = STUB_LABEL)) +
scale_color_brewer(palette = "Set1") +
labs(title = "Sucide Rates by Sex (1950-2018)",
x = "Year",
y = "Suicide Rate (Per 100,000)",
color = "Sex") +
theme_bw()
suicide_2018 <- suicide %>%
filter(UNIT == "Deaths per 100,000 resident population, crude") %>%
filter(YEAR == "2018") %>%
filter(STUB_LABEL %in% c("10-14 years", "15-19 years", "20-24 years", "25-34 years", "35-44 years", "45-54 years", "55-64 years", "65-74 years", "75-84 years")) %>%
arrange(-ESTIMATE)
head(suicide_2018, 9)
## INDICATOR UNIT
## 1 Death rates for suicide Deaths per 100,000 resident population, crude
## 2 Death rates for suicide Deaths per 100,000 resident population, crude
## 3 Death rates for suicide Deaths per 100,000 resident population, crude
## 4 Death rates for suicide Deaths per 100,000 resident population, crude
## 5 Death rates for suicide Deaths per 100,000 resident population, crude
## 6 Death rates for suicide Deaths per 100,000 resident population, crude
## 7 Death rates for suicide Deaths per 100,000 resident population, crude
## 8 Death rates for suicide Deaths per 100,000 resident population, crude
## 9 Death rates for suicide Deaths per 100,000 resident population, crude
## UNIT_NUM STUB_NAME STUB_NAME_NUM STUB_LABEL STUB_LABEL_NUM YEAR YEAR_NUM
## 1 2 Age 1 55-64 years 1.42 2018 42
## 2 2 Age 1 45-54 years 1.41 2018 42
## 3 2 Age 1 75-84 years 1.52 2018 42
## 4 2 Age 1 35-44 years 1.32 2018 42
## 5 2 Age 1 25-34 years 1.31 2018 42
## 6 2 Age 1 20-24 years 1.22 2018 42
## 7 2 Age 1 65-74 years 1.51 2018 42
## 8 2 Age 1 15-19 years 1.21 2018 42
## 9 2 Age 1 10-14 years 1.10 2018 42
## AGE AGE_NUM ESTIMATE FLAG
## 1 55-64 years 4.2 20.2
## 2 45-54 years 4.1 20.0
## 3 75-84 years 5.2 18.7
## 4 35-44 years 3.2 18.2
## 5 25-34 years 3.1 17.6
## 6 20-24 years 2.2 17.4
## 7 65-74 years 5.1 16.3
## 8 15-19 years 2.1 11.4
## 9 10-14 years 1.0 2.9
ggplot(data = suicide_2018) +
geom_bar(aes(x = AGE, y = ESTIMATE), stat = "identity") +
labs(x = "Age Group", y = "Suicide Rate (Per 100,000)", title = "2018 Suicide Rate by Age Group") +
theme_classic()
geom_point()
## geom_point: na.rm = FALSE
## stat_identity: na.rm = FALSE
## position_identity
The top two age groups most susceptible to suicide are 55-64 and 45-54 with 20.2 and 20.0 suicides per 100,000 people, respectively. This is surprising since I’ve heard that suicide is a leading cause of death for younger age groups.
suicide_young <- suicide %>%
filter(UNIT == "Deaths per 100,000 resident population, crude") %>%
filter(STUB_LABEL %in% c("10-14 years", "15-19 years", "20-24 years", "25-34 years")) %>%
arrange(-ESTIMATE)
head(suicide_2018, 9)
## INDICATOR UNIT
## 1 Death rates for suicide Deaths per 100,000 resident population, crude
## 2 Death rates for suicide Deaths per 100,000 resident population, crude
## 3 Death rates for suicide Deaths per 100,000 resident population, crude
## 4 Death rates for suicide Deaths per 100,000 resident population, crude
## 5 Death rates for suicide Deaths per 100,000 resident population, crude
## 6 Death rates for suicide Deaths per 100,000 resident population, crude
## 7 Death rates for suicide Deaths per 100,000 resident population, crude
## 8 Death rates for suicide Deaths per 100,000 resident population, crude
## 9 Death rates for suicide Deaths per 100,000 resident population, crude
## UNIT_NUM STUB_NAME STUB_NAME_NUM STUB_LABEL STUB_LABEL_NUM YEAR YEAR_NUM
## 1 2 Age 1 55-64 years 1.42 2018 42
## 2 2 Age 1 45-54 years 1.41 2018 42
## 3 2 Age 1 75-84 years 1.52 2018 42
## 4 2 Age 1 35-44 years 1.32 2018 42
## 5 2 Age 1 25-34 years 1.31 2018 42
## 6 2 Age 1 20-24 years 1.22 2018 42
## 7 2 Age 1 65-74 years 1.51 2018 42
## 8 2 Age 1 15-19 years 1.21 2018 42
## 9 2 Age 1 10-14 years 1.10 2018 42
## AGE AGE_NUM ESTIMATE FLAG
## 1 55-64 years 4.2 20.2
## 2 45-54 years 4.1 20.0
## 3 75-84 years 5.2 18.7
## 4 35-44 years 3.2 18.2
## 5 25-34 years 3.1 17.6
## 6 20-24 years 2.2 17.4
## 7 65-74 years 5.1 16.3
## 8 15-19 years 2.1 11.4
## 9 10-14 years 1.0 2.9
ggplot(data = suicide_young) +
geom_point(aes(x = YEAR,
y = ESTIMATE,
colour = STUB_LABEL)) +
scale_color_brewer(palette = "Set1") +
labs(title = "Suicide Rates, Ages 10-34 (1950-2018)",
x = "Year",
y = "Suicide Rate (Per 100,000)",
color = "Age Group") +
theme_bw()
It’s interesting to see age groups 15-19, 20-24 and 25-34 reflect similar trends. It appears that suicide rates within these ages all decreased in the 1990s and then increased significantly in the 2000s.
This data is too messy to filter solely by race in any meaningful way, so let’s look at which race is most susceptible to suicide among males since we already know their rates are higher than females’ by a wide margin.
suicide_race <- suicide %>%
filter(STUB_LABEL %in% c("Male: American Indian or Alaska Native", "Male: Asian or Pacific Islander", "Male: Black or African American", "Male: White")) %>%
filter(UNIT == "Deaths per 100,000 resident population, age-adjusted") %>%
filter(YEAR %in% c("2018")) %>%
filter(STUB_NAME == "Sex and race (Single race)") %>%
arrange(-ESTIMATE)
head(suicide_race)
## INDICATOR UNIT
## 1 Death rates for suicide Deaths per 100,000 resident population, age-adjusted
## 2 Death rates for suicide Deaths per 100,000 resident population, age-adjusted
## 3 Death rates for suicide Deaths per 100,000 resident population, age-adjusted
## UNIT_NUM STUB_NAME STUB_NAME_NUM
## 1 1 Sex and race (Single race) 8
## 2 1 Sex and race (Single race) 8
## 3 1 Sex and race (Single race) 8
## STUB_LABEL STUB_LABEL_NUM YEAR YEAR_NUM AGE
## 1 Male: White 4.11 2018 42 All ages
## 2 Male: American Indian or Alaska Native 4.13 2018 42 All ages
## 3 Male: Black or African American 4.12 2018 42 All ages
## AGE_NUM ESTIMATE FLAG
## 1 0 25.6
## 2 0 20.3
## 3 0 11.8
In 2018, We can see that White males had the highest suicide rate at 25.6, followed by American Indians or Alaska Natives at 20.3, and Blacks or African Americans at 11.8.
This time, let’s add some variance by including the last nine years of the dataset (2010-2018).
suicide_recent <- suicide %>%
filter(STUB_LABEL %in% c("Male: Black or African American", "Male: White")) %>%
filter(UNIT == "Deaths per 100,000 resident population, age-adjusted") %>%
filter(YEAR %in% c("2018", "2017", "2016", "2015", "2014", "2013", "2012", "2011", "2010")) %>%
filter(STUB_NAME == "Sex and race") %>%
arrange(-ESTIMATE)
head(suicide_race)
## INDICATOR UNIT
## 1 Death rates for suicide Deaths per 100,000 resident population, age-adjusted
## 2 Death rates for suicide Deaths per 100,000 resident population, age-adjusted
## 3 Death rates for suicide Deaths per 100,000 resident population, age-adjusted
## UNIT_NUM STUB_NAME STUB_NAME_NUM
## 1 1 Sex and race (Single race) 8
## 2 1 Sex and race (Single race) 8
## 3 1 Sex and race (Single race) 8
## STUB_LABEL STUB_LABEL_NUM YEAR YEAR_NUM AGE
## 1 Male: White 4.11 2018 42 All ages
## 2 Male: American Indian or Alaska Native 4.13 2018 42 All ages
## 3 Male: Black or African American 4.12 2018 42 All ages
## AGE_NUM ESTIMATE FLAG
## 1 0 25.6
## 2 0 20.3
## 3 0 11.8
boxplot(ESTIMATE ~ STUB_LABEL, data = suicide_recent,
ylab = "Suicide Rate (Per 100,000)", xlab = "Racial Group")
Finally, we are able to see that based off the data from 2018, the demographic most at risk of committing suicide is white males between the ages of 55 and 64.
It was challenging finding ways to condense such a vast and expansive dataset in a way that was interesting and meaningful. I kept filtering the data to limit the amount of information taken into account. It was also difficult to group the data by race, because it was lumped together with sex in numerous rows and then left alone in others. I struggled to delete superfluous and needless columns that were not relevant to what I was studying.
Now that we see the most vulnerable groups to suicide, we should look into what else is correlated to these groups. This way we can address those correlations and see what needs to be addressed and changed so that suicides can hopefully decrease. Understanding why groups of people commit suicide is critical as well. We are able to pinpoint exactly what age groups, races and sexes are at the most risk, and where those groups overlap.
One unanswered question is if ethnicity or religion plays a role in influencing suicide rates. Or, does country of origin have a strong correlation to suicide? What about sexual orientation? One other column this dataset could include is method for committing suicide. We could then understand which types of weapons are most common for each demographic and then see how accessible those weapons are in each community. Maybe we would even see if suicides are occurring more frequently in rural or urban areas, or states run by Republicans versus states run by Democrats. This kind of investigation could potentially find suicide correlations with political policies.
Publisher Centers for Disease Control and Prevention. (2022, April 28). U.S. Department of Health & Human Services - death rates for suicide, by sex, race, Hispanic origin, and age: United States. Catalog. https://catalog.data.gov/dataset/death-rates-for-suicide-by-sex-race-hispanic-origin-and-age-united-states-020c1