Suicide Rates Overview 1985 to 2016

Compares socio-economic info with suicide rates by year and country

Introduction

The dataset topic that I will be exploring for this third project is the suicide rate between the years 1985 to 2016. I picked this topic for two reasons. First, I was actually talking about suicide rates with some of my friends when we were talking about our home country South Korea because it is a well known fact that suicide rates are high over there because of the socioeconomic pressures in that country. So that brought curiosity wheter that fact was true. Secondly, this dataset was just filled with usable variables to make the data visualization process more interesting. This dataset includes sex, age, years, the number of suicides, and countries.

Background Information

Suicides present a significant burden for societies around the world. In 2016, suicide was among the top 10 leading causes of death in Eastern Europe, Central Europe, Western Europe, Central Asia, Australasia, Southern Latin America, and in high-income areas of North America. According to the 2019 estimates from the World Health Organization (WHO), suicides caused over 700000 deaths worldwide (representing about 1.3% of all deaths globally), making it the 17th leading cause of death in 2019. Because suicide has been prevalent in society, it is important to implement suicide prevention strategies in certain countries.

References and Data Source

United Nations Development Program. (2018). Human development index (HDI). Retrieved from http://hdr.undp.org/en/indicators/137506

World Bank. (2018). World development indicators: GDP (current US$) by country:1985 to 2016. Retrieved from http://databank.worldbank.org/data/source/world-development-indicators#

[Szamil]. (2017). Suicide in the Twenty-First Century [dataset]. Retrieved from https://www.kaggle.com/szamil/suicide-in-the-twenty-first-century/notebook

World Health Organization. (2018). Suicide prevention. Retrieved from http://www.who.int/mental_health/suicide-prevention/en/

World Health Organization. Mental Health and Substance Use. [cited 2 July 2021]. Available from: https://www.who.int/teams/mental-health-and-substance-use/data-research/suicide-data .

Naghavi M Global Burden of Disease Self-Harm Collaborators. Global, regional, and national burden of suicide mortality 1990 to 2016: systematic analysis for the Global Burden of Disease Study 2016. BMJ. 2019;364:l94. [PMC free article] [PubMed] [Google Scholar] [Ref list]

Load Libraries

library(tidyverse)

## ── Attaching core tidyverse packages ──────────────────────── tidyverse 2.0.0 ──
## ✔ dplyr     1.1.0     ✔ readr     2.1.4
## ✔ forcats   1.0.0     ✔ stringr   1.5.0
## ✔ ggplot2   3.4.1     ✔ tibble    3.1.8
## ✔ lubridate 1.9.2     ✔ tidyr     1.3.0
## ✔ purrr     1.0.1     
## ── Conflicts ────────────────────────────────────────── tidyverse_conflicts() ──
## ✖ dplyr::filter() masks stats::filter()
## ✖ dplyr::lag()    masks stats::lag()
## ℹ Use the ]8;;http://conflicted.r-lib.org/conflicted package]8;; to force all conflicts to become errors

library(dplyr)
library(ggplot2)
library(plotly)# Interactive data visualizations

## Warning: package 'plotly' was built under R version 4.2.3

## 
## Attaching package: 'plotly'
## 
## The following object is masked from 'package:ggplot2':
## 
##     last_plot
## 
## The following object is masked from 'package:stats':
## 
##     filter
## 
## The following object is masked from 'package:graphics':
## 
##     layout

library(highcharter) # Interactive data visualizations

## Warning: package 'highcharter' was built under R version 4.2.3

## Registered S3 method overwritten by 'quantmod':
##   method            from
##   as.zoo.data.frame zoo

library(viridis) # Color gradients

## Loading required package: viridisLite

library(countrycode) # Converts country names/codes and can group continents

## Warning: package 'countrycode' was built under R version 4.2.3

library(rjson) # JSON reader
library(crosstalk) # Provides interactivity for HTML widgets

## Warning: package 'crosstalk' was built under R version 4.2.3

knitr::include_graphics("C:/Users/andre/OneDrive/Documents/Data/2019-07-cover-suicide_tcm7-258230.jpg")

Set Working Directory and Load Dataset

While looking through the dataset, I found out that there was not enough data for decent amount of countries in 2016 and found two countries with not enough data in general so I took out the year 2016 and Dominica and Saint Kitts and Nevis.

# Read in data. 
data <- read.csv('master.csv') %>%
  filter(year != 2016,
         country != 'Dominica',
         country != 'Saint Kitts and Nevis')

When I tried to merge the dataset to be used on a map, I gotten errors because I realized that some of the countries in the data was named differently. So I fixed the names of some of the countries in our data to match the country names used by our map later on so that they’ll be interpreted and displayed.

data <- data %>%
  mutate(country = fct_recode(country, "The Bahamas" = "Bahamas"),
         country = fct_recode(country, "Cape Verde" = "Cabo Verde"),
         country = fct_recode(country, "South Korea" = "Republic of Korea"),
         country = fct_recode(country, "Russia" = "Russian Federation"),
         country = fct_recode(country, "Republic of Serbia" = "Serbia"),
         country = fct_recode(country, "United States of America" = "United States"))

I also reordered and created levels of age to be in chronological order so that when plotting the line graphs, the age groups wouldn’t be randomly placed.

data$age <- factor(data$age, levels = c("5-14 years", "15-24 years", "25-34 years", "35-54 years", "55-74 years", "75+ years"))

Create a custom theme for the plots.

custom_theme <- hc_theme(
  colors = c('#5CACEE', 'green', 'red'),
  chart = list(
         backgroundColor = '#FAFAFA', 
         plotBorderColor = "black"),
  xAxis = list(
         gridLineColor = "E5E5E5", 
         labels = list(style = list(color = "#333333")), 
         lineColor = "#E5E5E5", 
         minorGridLineColor = "#E5E5E5", 
         tickColor = "#E5E5E5", 
         title = list(style = list(color = "#333333"))), 
  yAxis = list(
         gridLineColor = "#E5E5E5", 
         labels = list(style = list(color = "#333333")), 
         lineColor = "#E5E5E5", 
         minorGridLineColor = "#E5E5E5", 
         tickColor = "#E5E5E5", 
         tickWidth = 1, 
         title = list(style = list(color = "#333333"))),   
  title = list(style = list(color = '#333333', fontFamily = "Lato")),
  subtitle = list(style = list(color = '#666666', fontFamily = "Lato")),
  legend = list(
         itemStyle = list(color = "#333333"), 
         itemHoverStyle = list(color = "#FFF"), 
         itemHiddenStyle = list(color = "#606063")), 
  credits = list(style = list(color = "#666")),
  itemHoverStyle = list(color = 'gray'))

Worldwide suicides by Year

# Create tibble for our line plot.  
overall_tibble <- data %>%
  select(year, suicides_no, population) %>%
  group_by(year) %>%
  summarise(suicide_capita = round((sum(suicides_no)/sum(population))*100000, 2))

# Create a line plot.
highchart() %>% 
  
    hc_add_series(overall_tibble, hcaes(x = year, y = suicide_capita, color = suicide_capita), type = "line") %>%
  
    hc_tooltip(crosshairs = TRUE, borderWidth = 1.5, headerFormat = "", pointFormat = paste("Year: <b>{point.x}</b> <br> Suicides: <b>{point.y}</b>")) %>%
  
    hc_title(text = "Worldwide Suicides by Year") %>% 
    hc_subtitle(text = "1985-2015") %>%
    hc_xAxis(title = list(text = "Year")) %>%
    hc_yAxis(title = list(text = "Suicides per 100K people"),
             allowDecimals = FALSE,
             plotLines = list(list(
                    color = "black", width = 1, dashStyle = "Dash", 
                    value = mean(overall_tibble$suicide_capita),
                    label = list(text = "Mean = 13.12", 
                                 style = list(color = "black", fontSize = 11))))) %>%
    hc_legend(enabled = FALSE) %>% 
    hc_add_theme(custom_theme)

Worldwide suicides by Gender

# Create tibble for sex so we can use it when creating our line plot.  
sex_tibble <- data %>%
  select(year, sex, suicides_no, population) %>%
  group_by(year, sex) %>%
  summarise(suicide_capita = round((sum(suicides_no)/sum(population))*100000, 2))

## `summarise()` has grouped output by 'year'. You can override using the
## `.groups` argument.

# Pick color for gender.
sex_color <- c("#EE6AA7", "#87CEEB") # baby blue & pink

# Create line plot.
highchart() %>% 
    hc_add_series(sex_tibble, hcaes(x = year, y = suicide_capita, group = sex), type = "line", color = sex_color) %>%
    hc_tooltip(crosshairs = TRUE, borderWidth = 1.5, headerFormat = "", pointFormat = paste("Year: <b>{point.x}</b> <br>","Gender: <b>{point.sex}</b><br>", "Suicides: <b>{point.y}</b>")) %>%
    hc_title(text = "Worldwide suicides by Gender") %>% 
    hc_subtitle(text = "1985-2015") %>%
    hc_xAxis(title = list(text = "Year")) %>%
    hc_yAxis(title = list(text = "Suicides per 100K people"),
             allowDecimals = FALSE,
             plotLines = list(list(
                    color = "black", width = 1, dashStyle = "Dash",
                    value = mean(overall_tibble$suicide_capita),
                    label = list(text = "Mean = 13.12", 
                                 style = list(color = 'black', fontSize = 11))))) %>% 
    hc_add_theme(custom_theme)

Worldwide suicides by Age Group

# Create tibble for age so we can use it when creating our line plot.  
age_tibble <- data %>%
  select(year, age, suicides_no, population) %>%
  group_by(year, age) %>%
  summarise(suicide_capita = round((sum(suicides_no)/sum(population))*100000, 2))

## `summarise()` has grouped output by 'year'. You can override using the
## `.groups` argument.

# Pick color for graph. 
age_color <- rev(plasma(6))

# Create a line plot.
highchart() %>% 
    hc_add_series(age_tibble, hcaes(x = year, y = suicide_capita, group = age), type = "line", color = age_color) %>%
    hc_tooltip(crosshairs = TRUE, borderWidth = 1.5, headerFormat = "", pointFormat = paste("Year: <b>{point.x}</b> <br>","Age: <b>{point.age}</b><br>", "Suicides: <b>{point.y}</b>")) %>%
    hc_title(text = "Worldwide suicides by Age") %>% 
    hc_subtitle(text = "1985-2015") %>%
    hc_xAxis(title = list(text = "Year")) %>%
    hc_yAxis(title = list(text = "Suicides per 100K people"),
             allowDecimals = FALSE,
             plotLines = list(list(
                    color = "black", width = 1, dashStyle = "Dash",
                    value = mean(overall_tibble$suicide_capita),
                    label = list(text = "Mean = 13.12", 
                                 style = list(color = 'black', fontSize = 11))))) %>% 
    hc_add_theme(custom_theme)

# First, make a tibble of suicide by sex. We will use this for our pie chart.
pie_sex <- data %>%
  select(sex, suicides_no, population) %>%
  group_by(sex) %>%
  summarise(suicide_capita = round((sum(suicides_no)/sum(population))*100000, 2))
  
# Create pie chart for sex. 
highchart() %>% 
  hc_add_series(pie_sex, hcaes(x = sex, y = suicide_capita, 
                               color = sex_color), type = "pie") %>%
  hc_tooltip(borderWidth = 1.5, headerFormat = "", pointFormat = paste("Gender: <b>{point.sex} ({point.percentage:.1f}%)</b> <br> Suicides per 100K: <b>{point.y}</b>")) %>%
  hc_title(text = "<b>Worldwide suicides by Gender</b>", style = (list(fontSize = '14px'))) %>% 
  hc_subtitle(text = "1985-2015", style = (list(fontSize = '10px'))) %>%
  hc_plotOptions(pie = list(dataLabels = list(distance = 5, 
                            style = list(fontSize = 10)), 
                            size = 130)) %>% 
  hc_add_theme(custom_theme)

Worldwide suicides by Age

# First, create a tibble of suicide by Age. We will use this for our pie chart.
pie_age <- data %>%
  select(age, suicides_no, population) %>%
  group_by(age) %>%
  summarise(suicide_capita = round((sum(suicides_no)/sum(population))*100000, 2)) %>%
  arrange(suicide_capita)

# Create pie chart for Age. 
highchart() %>% 
  hc_add_series(pie_age, hcaes(x = age, y = suicide_capita, 
                               color = age_color), type = "pie") %>%
  hc_tooltip(borderWidth = 1.5, headerFormat = "", pointFormat = paste("Age: <b>{point.age} ({point.percentage:.1f}%)</b> <br> Suicides per 100K: <b>{point.y}</b>")) %>%  
  hc_title(text = "<b>Worldwide suicides by Age</b>", style = (list(fontSize = '14px'))) %>% 
  hc_subtitle(text = "1985-2015", style = (list(fontSize = '10px'))) %>%
  hc_plotOptions(pie = list(dataLabels = list(distance = 5, 
                            style = list(fontSize = 10)), 
                            size = 130)) %>% 
  hc_add_theme(custom_theme)

I googled if I could somehow find a way to group the countries into continents and to my surprise, I learned to use the countrycode library to help extract continents from country names. Then I created a new column for continents

data$continent <- countrycode(sourcevar = data$country,
                              origin = "country.name",
                              destination = "continent")

# I then found out that I had to reclassify countries that have been coded as 'Americas', by countrycode(), into 'North America' and 'South America'. 

south_america <- c('Argentina', 'Brazil', 'Chile', 'Colombia', 'Ecuador', 'Guyana', 'Paraguay', 'Suriname', 'Uruguay')

data$continent[data$country %in% south_america] <- 'South America'
data$continent[data$continent=='Americas'] <- 'North America'

Suicides by continent and Gender

# Create a tibble for continent and sex.
continent_sex_tibble <- data %>%
  select(continent, sex, suicides_no, population) %>%
  group_by(continent, sex) %>%
  summarize(suicide_capita = round((sum(suicides_no)/sum(population))*100000, 2))

## `summarise()` has grouped output by 'continent'. You can override using the
## `.groups` argument.

# Create histogram of suicides by continent.
highchart() %>%
hc_add_series(continent_sex_tibble, hcaes(x = continent, y = suicide_capita, group = sex), type = "column")  %>% 
    hc_colors(colors = sex_color) %>%
    hc_title(text = "Suicides by continent and <b>Gender</b>", style = (list(fontSize = '14px'))) %>% 
    hc_subtitle(text = "1985-2015") %>%
    hc_tooltip(borderWidth = 1.5, pointFormat = paste("Gender: <b> {point.sex} </b> <br> Suicides: <b>{point.y}</b>")) %>%
    hc_xAxis(categories = c("Africa", "Asia", "Europe", "North <br> America", "Oceania", "South <br> America"), labels = list(style = list(fontSize = 8))) %>%
    hc_yAxis(labels = list(style = list(fontSize = 10)),
             title = list(text = "Suicides per 100K people",
             style = list(fontSize = 10)),
        plotLines = list(
          list(color = "black", width = 1, dashStyle = "Dash", 
               value = mean(overall_tibble$suicide_capita),
               label = list(text = "Mean = 13.12", style = list(color = "black", fontSize = 6))))) %>%     
    hc_legend(verticalAlign = 'top', enabled = FALSE) %>% 
    hc_add_theme(custom_theme)

Suicides by continent and Age

# Create a tibble for continent and sex.
continent_age_tibble <- data %>%
  select(continent, age, suicides_no, population) %>%
  group_by(continent, age) %>%
  summarize(suicide_capita = round((sum(suicides_no)/sum(population))*100000, 2))

## `summarise()` has grouped output by 'continent'. You can override using the
## `.groups` argument.

# Create histogram of suicides by continent.
highchart() %>%
hc_add_series(continent_age_tibble, hcaes(x = continent, y = suicide_capita, group = age), type = "column")  %>% 
    hc_colors(colors = age_color) %>%
    hc_title(text = "Suicides by continent and <b>Age</b>", style = (list(fontSize = '14px'))) %>% 
    hc_subtitle(text = "1985-2015") %>%
    hc_tooltip(borderWidth = 1.5, pointFormat = paste("Age: <b> {point.age} </b> <br> Suicides: <b>{point.y}</b>")) %>%
    hc_xAxis(categories = c("Africa", "Asia", "Europe", "North <br> America", "Oceania", "South <br> America"), labels = list(style = list(fontSize = 8))) %>%
    hc_yAxis(labels = list(style = list(fontSize = 10)),
             title = list(text = "Suicides per 100K people",
                          style = list(fontSize = 10)),
        plotLines = list(
          list(color = "black", width = 1, dashStyle = "Dash", 
               value = mean(overall_tibble$suicide_capita),
               label = list(text = "Mean = 13.12", style = list(color = "black", fontSize = 6))))) %>%    
    hc_legend(verticalAlign = 'top', enabled = FALSE) %>% 
    hc_add_theme(custom_theme)

Suicides by Country

# Create tibble for overall suicides by country
country_bar <- data %>%
  select(country, suicides_no, population) %>%
  group_by(country) %>%
  summarise(suicide_capita = round((sum(suicides_no)/sum(population))*100000, 2)) %>%
  arrange(desc(suicide_capita))

# Create interactive bar plot
highchart() %>%
    hc_add_series(country_bar, hcaes(x = country, y = suicide_capita, color = suicide_capita), type = "bar")  %>% 
    hc_tooltip(borderWidth = 1.5, 
               pointFormat = paste("Suicides: <b>{point.y}</b>")) %>%
    hc_legend(enabled = FALSE) %>%
    hc_title(text = "Suicides by country") %>% 
    hc_subtitle(text = "1985-2015") %>%
    hc_xAxis(categories = country_bar$country, 
             labels = list(step = 1),
             min = 0, max = 25,
             scrollbar = list(enabled = TRUE)) %>%
    hc_yAxis(title = list(text = "Suicides per 100K people")) %>%
    hc_plotOptions(bar = list(stacking = "normal", 
                              pointPadding = 0, groupPadding = 0, borderWidth = 0.5)) %>% 
    hc_add_theme(custom_theme)

Suicide Rates

Andrew Kwak

2023-05-08