Project 2

Author

Oluwatosin Akinmoladun

Introduction

The topic of this project is suicide rates worldwide, using data from the World Health Organization (WHO) on suicide statistics. This dataset includes variables such as:

Country (categorical)
Year (date/categorical)
Sex (categorical)
Age group (categorical)
Suicide rate per 100,000 population (quantitative)
Population (quantitative).

The data comes from the World Health Organization, the global public health agency of the United Nations. According to the WHO, suicide is a significant public health issue worldwide, and this dataset was compiled through national statistical agencies and health ministries submitting annual reports of deaths by suicide, aggregated and standardized by the WHO to enable international comparisons.

I chose this dataset because suicide is a critical, complex issue that affects millions of families globally. Understanding how suicide rates vary by age, sex, and country can help identify vulnerable populations and inform prevention strategies. Personally, I believe bringing attention to this data is important to reduce stigma and encourage policies to address mental health more effectively.

Load Library

library(tidyverse)

── Attaching core tidyverse packages ──────────────────────── tidyverse 2.0.0 ──
✔ dplyr     1.1.4     ✔ readr     2.1.5
✔ forcats   1.0.0     ✔ stringr   1.5.1
✔ ggplot2   3.5.2     ✔ tibble    3.3.0
✔ lubridate 1.9.4     ✔ tidyr     1.3.1
✔ purrr     1.0.4     
── Conflicts ────────────────────────────────────────── tidyverse_conflicts() ──
✖ dplyr::filter() masks stats::filter()
✖ dplyr::lag()    masks stats::lag()
ℹ Use the conflicted package (<http://conflicted.r-lib.org/>) to force all conflicts to become errors

library(plotly)


Attaching package: 'plotly'

The following object is masked from 'package:ggplot2':

    last_plot

The following object is masked from 'package:stats':

    filter

The following object is masked from 'package:graphics':

    layout

library(ggplot2)

Load Dataset

df_suicide <- read_csv("C:/Users/tosin/Downloads/who_suicide_statistics(1).csv")

Rows: 43776 Columns: 6
── Column specification ────────────────────────────────────────────────────────
Delimiter: ","
chr (3): country, sex, age
dbl (3): year, suicides_no, population

ℹ Use `spec()` to retrieve the full column specification for this data.
ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.

Clean Dataset

# Clean, filter, and summarize suicide dataset
df_clean <- df_suicide |>
  select(country, year, sex, age, suicides_no, population) |>
  filter(year >= 2008) |>
  mutate(rate_per_100k = suicides_no)

# Summarize average suicide rate by sex and age
df_summary <- df_clean |>
  group_by(sex, age) |>
  summarise(
    avg_suicide_rate = mean(rate_per_100k, na.rm = TRUE)
  )

`summarise()` has grouped output by 'sex'. You can override using the `.groups`
argument.

# View summary table sorted by highest average rate
df_summary |>
  arrange(desc(avg_suicide_rate))

# A tibble: 12 × 3
# Groups:   sex [2]
   sex    age         avg_suicide_rate
   <chr>  <chr>                  <dbl>
 1 male   35-54 years           608.  
 2 male   55-74 years           412.  
 3 male   25-34 years           277.  
 4 male   15-24 years           191.  
 5 female 35-54 years           167.  
 6 male   75+ years             156.  
 7 female 55-74 years           130.  
 8 female 25-34 years            67.1 
 9 female 75+ years              66.9 
10 female 15-24 years            56.9 
11 male   5-14 years             10.4 
12 female 5-14 years              6.85

Make your 1st plot

plot3 <- ggplot(df_summary, aes(x = age, y = avg_suicide_rate, fill = sex)) +
  geom_col(position = "dodge") +
  labs(
    title = "Average Number of Suicides by Age Group and Sex (2008)",
    x = "Age Group",
    y = "Average Number of Suicides",
    fill = "Sex",
    caption = "Data source: WHO Suicide Statistics"
  ) +
  scale_fill_manual(values = c("male" = "#0072B2", "female" = "#CC79A7")) +
  theme_dark() +
  annotate(
  "text",
  x = 3,                  
  y = 650,                 
  label = "Peak suicide rate",
  color = "red",
  size = 5,
  fontface = "bold"
)

plot3

This plot reveals significant patterns in suicide rates by demographics, showing that middle-aged adults experience the highest suicide rates, with males in this age group reaching a peak of nearly 600 suicides on average. The data demonstrates a stark gender disparity across all age groups, with male suicide rates consistently exceeding female rates by substantial margins. The pattern shows relatively lower rates among younger adults and elderly populations , while the 55-74 age group maintains elevated but declining rates compared to the peak middle-age period. This age-related trend suggests that the pressures and challenges of middle age potentially including career stress, financial responsibilities, and life transitions may contribute to heightened suicide risk, particularly among men who show rates approximately 2-3 times higher than women across most age categories.

Filter data for the interactive plot

country_comparison <- df_suicide |>
  # Remove rows with missing or zero population
  filter(!is.na(suicides_no), !is.na(population), population > 0) |>
  # Calculate suicide rate
  mutate(suicide_rate = (suicides_no / population) * 100000) |>
  # Group by country and calculate average
  group_by(country) |>
  summarise(
    avg_rate = mean(suicide_rate, na.rm = TRUE),
    .groups = 'drop'
  ) %>%
  # Remove any countries with NA rates
  filter(!is.na(avg_rate)) |>
  # Sort and take top 15
  arrange(desc(avg_rate)) |>
  slice_head(n = 8)

country_plot <- ggplot(country_comparison, aes(
  x = reorder(country, avg_rate),
  y = avg_rate,
  fill = avg_rate
)) +
  geom_col(alpha = 0.8) +
  coord_flip() +
  scale_fill_gradient(
    low = "#A7FFEB",   
    high = "#00695C"   
  ) +
  labs(
    title = "Countries with Highest Average Suicide Rates",
    x = "Country",
    y = "Average Suicide Rate (per 100,000 population)",
    fill = "Avg Rate"
  ) +
  theme_minimal()

country_plot