The dataset topic that I will be exploring for this third project is the suicide rate between the years 1985 to 2016. I picked this topic for two reasons. First, I was actually talking about suicide rates with some of my friends when we were talking about our home country South Korea because it is a well known fact that suicide rates are high over there because of the socioeconomic pressures in that country. So that brought curiosity wheter that fact was true. Secondly, this dataset was just filled with usable variables to make the data visualization process more interesting. This dataset includes sex, age, years, the number of suicides, and countries.
Suicides present a significant burden for societies around the world. In 2016, suicide was among the top 10 leading causes of death in Eastern Europe, Central Europe, Western Europe, Central Asia, Australasia, Southern Latin America, and in high-income areas of North America. According to the 2019 estimates from the World Health Organization (WHO), suicides caused over 700000 deaths worldwide (representing about 1.3% of all deaths globally), making it the 17th leading cause of death in 2019. Because suicide has been prevalent in society, it is important to implement suicide prevention strategies in certain countries.
United Nations Development Program. (2018). Human development index (HDI). Retrieved from http://hdr.undp.org/en/indicators/137506
World Bank. (2018). World development indicators: GDP (current US$) by country:1985 to 2016. Retrieved from http://databank.worldbank.org/data/source/world-development-indicators#
[Szamil]. (2017). Suicide in the Twenty-First Century [dataset]. Retrieved from https://www.kaggle.com/szamil/suicide-in-the-twenty-first-century/notebook
World Health Organization. (2018). Suicide prevention. Retrieved from http://www.who.int/mental_health/suicide-prevention/en/
World Health Organization. Mental Health and Substance Use. [cited 2 July 2021]. Available from: https://www.who.int/teams/mental-health-and-substance-use/data-research/suicide-data .
Naghavi M Global Burden of Disease Self-Harm Collaborators. Global, regional, and national burden of suicide mortality 1990 to 2016: systematic analysis for the Global Burden of Disease Study 2016. BMJ. 2019;364:l94. [PMC free article] [PubMed] [Google Scholar] [Ref list]
library(tidyverse)
## ── Attaching core tidyverse packages ──────────────────────── tidyverse 2.0.0 ──
## ✔ dplyr 1.1.0 ✔ readr 2.1.4
## ✔ forcats 1.0.0 ✔ stringr 1.5.0
## ✔ ggplot2 3.4.1 ✔ tibble 3.1.8
## ✔ lubridate 1.9.2 ✔ tidyr 1.3.0
## ✔ purrr 1.0.1
## ── Conflicts ────────────────────────────────────────── tidyverse_conflicts() ──
## ✖ dplyr::filter() masks stats::filter()
## ✖ dplyr::lag() masks stats::lag()
## ℹ Use the ]8;;http://conflicted.r-lib.org/conflicted package]8;; to force all conflicts to become errors
library(dplyr)
library(ggplot2)
library(plotly)# Interactive data visualizations
## Warning: package 'plotly' was built under R version 4.2.3
##
## Attaching package: 'plotly'
##
## The following object is masked from 'package:ggplot2':
##
## last_plot
##
## The following object is masked from 'package:stats':
##
## filter
##
## The following object is masked from 'package:graphics':
##
## layout
library(highcharter) # Interactive data visualizations
## Warning: package 'highcharter' was built under R version 4.2.3
## Registered S3 method overwritten by 'quantmod':
## method from
## as.zoo.data.frame zoo
library(viridis) # Color gradients
## Loading required package: viridisLite
library(countrycode) # Converts country names/codes and can group continents
## Warning: package 'countrycode' was built under R version 4.2.3
library(rjson) # JSON reader
library(crosstalk) # Provides interactivity for HTML widgets
## Warning: package 'crosstalk' was built under R version 4.2.3
knitr::include_graphics("C:/Users/andre/OneDrive/Documents/Data/2019-07-cover-suicide_tcm7-258230.jpg")
While looking through the dataset, I found out that there was not enough data for decent amount of countries in 2016 and found two countries with not enough data in general so I took out the year 2016 and Dominica and Saint Kitts and Nevis.
# Read in data.
data <- read.csv('master.csv') %>%
filter(year != 2016,
country != 'Dominica',
country != 'Saint Kitts and Nevis')
When I tried to merge the dataset to be used on a map, I gotten errors because I realized that some of the countries in the data was named differently. So I fixed the names of some of the countries in our data to match the country names used by our map later on so that they’ll be interpreted and displayed.
data <- data %>%
mutate(country = fct_recode(country, "The Bahamas" = "Bahamas"),
country = fct_recode(country, "Cape Verde" = "Cabo Verde"),
country = fct_recode(country, "South Korea" = "Republic of Korea"),
country = fct_recode(country, "Russia" = "Russian Federation"),
country = fct_recode(country, "Republic of Serbia" = "Serbia"),
country = fct_recode(country, "United States of America" = "United States"))
I also reordered and created levels of age to be in chronological order so that when plotting the line graphs, the age groups wouldn’t be randomly placed.
data$age <- factor(data$age, levels = c("5-14 years", "15-24 years", "25-34 years", "35-54 years", "55-74 years", "75+ years"))
custom_theme <- hc_theme(
colors = c('#5CACEE', 'green', 'red'),
chart = list(
backgroundColor = '#FAFAFA',
plotBorderColor = "black"),
xAxis = list(
gridLineColor = "E5E5E5",
labels = list(style = list(color = "#333333")),
lineColor = "#E5E5E5",
minorGridLineColor = "#E5E5E5",
tickColor = "#E5E5E5",
title = list(style = list(color = "#333333"))),
yAxis = list(
gridLineColor = "#E5E5E5",
labels = list(style = list(color = "#333333")),
lineColor = "#E5E5E5",
minorGridLineColor = "#E5E5E5",
tickColor = "#E5E5E5",
tickWidth = 1,
title = list(style = list(color = "#333333"))),
title = list(style = list(color = '#333333', fontFamily = "Lato")),
subtitle = list(style = list(color = '#666666', fontFamily = "Lato")),
legend = list(
itemStyle = list(color = "#333333"),
itemHoverStyle = list(color = "#FFF"),
itemHiddenStyle = list(color = "#606063")),
credits = list(style = list(color = "#666")),
itemHoverStyle = list(color = 'gray'))
# Create tibble for our line plot.
overall_tibble <- data %>%
select(year, suicides_no, population) %>%
group_by(year) %>%
summarise(suicide_capita = round((sum(suicides_no)/sum(population))*100000, 2))
# Create a line plot.
highchart() %>%
hc_add_series(overall_tibble, hcaes(x = year, y = suicide_capita, color = suicide_capita), type = "line") %>%
hc_tooltip(crosshairs = TRUE, borderWidth = 1.5, headerFormat = "", pointFormat = paste("Year: <b>{point.x}</b> <br> Suicides: <b>{point.y}</b>")) %>%
hc_title(text = "Worldwide Suicides by Year") %>%
hc_subtitle(text = "1985-2015") %>%
hc_xAxis(title = list(text = "Year")) %>%
hc_yAxis(title = list(text = "Suicides per 100K people"),
allowDecimals = FALSE,
plotLines = list(list(
color = "black", width = 1, dashStyle = "Dash",
value = mean(overall_tibble$suicide_capita),
label = list(text = "Mean = 13.12",
style = list(color = "black", fontSize = 11))))) %>%
hc_legend(enabled = FALSE) %>%
hc_add_theme(custom_theme)
# Create tibble for sex so we can use it when creating our line plot.
sex_tibble <- data %>%
select(year, sex, suicides_no, population) %>%
group_by(year, sex) %>%
summarise(suicide_capita = round((sum(suicides_no)/sum(population))*100000, 2))
## `summarise()` has grouped output by 'year'. You can override using the
## `.groups` argument.
# Pick color for gender.
sex_color <- c("#EE6AA7", "#87CEEB") # baby blue & pink
# Create line plot.
highchart() %>%
hc_add_series(sex_tibble, hcaes(x = year, y = suicide_capita, group = sex), type = "line", color = sex_color) %>%
hc_tooltip(crosshairs = TRUE, borderWidth = 1.5, headerFormat = "", pointFormat = paste("Year: <b>{point.x}</b> <br>","Gender: <b>{point.sex}</b><br>", "Suicides: <b>{point.y}</b>")) %>%
hc_title(text = "Worldwide suicides by Gender") %>%
hc_subtitle(text = "1985-2015") %>%
hc_xAxis(title = list(text = "Year")) %>%
hc_yAxis(title = list(text = "Suicides per 100K people"),
allowDecimals = FALSE,
plotLines = list(list(
color = "black", width = 1, dashStyle = "Dash",
value = mean(overall_tibble$suicide_capita),
label = list(text = "Mean = 13.12",
style = list(color = 'black', fontSize = 11))))) %>%
hc_add_theme(custom_theme)
# Create tibble for age so we can use it when creating our line plot.
age_tibble <- data %>%
select(year, age, suicides_no, population) %>%
group_by(year, age) %>%
summarise(suicide_capita = round((sum(suicides_no)/sum(population))*100000, 2))
## `summarise()` has grouped output by 'year'. You can override using the
## `.groups` argument.
# Pick color for graph.
age_color <- rev(plasma(6))
# Create a line plot.
highchart() %>%
hc_add_series(age_tibble, hcaes(x = year, y = suicide_capita, group = age), type = "line", color = age_color) %>%
hc_tooltip(crosshairs = TRUE, borderWidth = 1.5, headerFormat = "", pointFormat = paste("Year: <b>{point.x}</b> <br>","Age: <b>{point.age}</b><br>", "Suicides: <b>{point.y}</b>")) %>%
hc_title(text = "Worldwide suicides by Age") %>%
hc_subtitle(text = "1985-2015") %>%
hc_xAxis(title = list(text = "Year")) %>%
hc_yAxis(title = list(text = "Suicides per 100K people"),
allowDecimals = FALSE,
plotLines = list(list(
color = "black", width = 1, dashStyle = "Dash",
value = mean(overall_tibble$suicide_capita),
label = list(text = "Mean = 13.12",
style = list(color = 'black', fontSize = 11))))) %>%
hc_add_theme(custom_theme)
# First, make a tibble of suicide by sex. We will use this for our pie chart.
pie_sex <- data %>%
select(sex, suicides_no, population) %>%
group_by(sex) %>%
summarise(suicide_capita = round((sum(suicides_no)/sum(population))*100000, 2))
# Create pie chart for sex.
highchart() %>%
hc_add_series(pie_sex, hcaes(x = sex, y = suicide_capita,
color = sex_color), type = "pie") %>%
hc_tooltip(borderWidth = 1.5, headerFormat = "", pointFormat = paste("Gender: <b>{point.sex} ({point.percentage:.1f}%)</b> <br> Suicides per 100K: <b>{point.y}</b>")) %>%
hc_title(text = "<b>Worldwide suicides by Gender</b>", style = (list(fontSize = '14px'))) %>%
hc_subtitle(text = "1985-2015", style = (list(fontSize = '10px'))) %>%
hc_plotOptions(pie = list(dataLabels = list(distance = 5,
style = list(fontSize = 10)),
size = 130)) %>%
hc_add_theme(custom_theme)
# First, create a tibble of suicide by Age. We will use this for our pie chart.
pie_age <- data %>%
select(age, suicides_no, population) %>%
group_by(age) %>%
summarise(suicide_capita = round((sum(suicides_no)/sum(population))*100000, 2)) %>%
arrange(suicide_capita)
# Create pie chart for Age.
highchart() %>%
hc_add_series(pie_age, hcaes(x = age, y = suicide_capita,
color = age_color), type = "pie") %>%
hc_tooltip(borderWidth = 1.5, headerFormat = "", pointFormat = paste("Age: <b>{point.age} ({point.percentage:.1f}%)</b> <br> Suicides per 100K: <b>{point.y}</b>")) %>%
hc_title(text = "<b>Worldwide suicides by Age</b>", style = (list(fontSize = '14px'))) %>%
hc_subtitle(text = "1985-2015", style = (list(fontSize = '10px'))) %>%
hc_plotOptions(pie = list(dataLabels = list(distance = 5,
style = list(fontSize = 10)),
size = 130)) %>%
hc_add_theme(custom_theme)
I googled if I could somehow find a way to group the countries into continents and to my surprise, I learned to use the countrycode library to help extract continents from country names. Then I created a new column for continents
data$continent <- countrycode(sourcevar = data$country,
origin = "country.name",
destination = "continent")
# I then found out that I had to reclassify countries that have been coded as 'Americas', by countrycode(), into 'North America' and 'South America'.
south_america <- c('Argentina', 'Brazil', 'Chile', 'Colombia', 'Ecuador', 'Guyana', 'Paraguay', 'Suriname', 'Uruguay')
data$continent[data$country %in% south_america] <- 'South America'
data$continent[data$continent=='Americas'] <- 'North America'
# Create a tibble for continent and sex.
continent_sex_tibble <- data %>%
select(continent, sex, suicides_no, population) %>%
group_by(continent, sex) %>%
summarize(suicide_capita = round((sum(suicides_no)/sum(population))*100000, 2))
## `summarise()` has grouped output by 'continent'. You can override using the
## `.groups` argument.
# Create histogram of suicides by continent.
highchart() %>%
hc_add_series(continent_sex_tibble, hcaes(x = continent, y = suicide_capita, group = sex), type = "column") %>%
hc_colors(colors = sex_color) %>%
hc_title(text = "Suicides by continent and <b>Gender</b>", style = (list(fontSize = '14px'))) %>%
hc_subtitle(text = "1985-2015") %>%
hc_tooltip(borderWidth = 1.5, pointFormat = paste("Gender: <b> {point.sex} </b> <br> Suicides: <b>{point.y}</b>")) %>%
hc_xAxis(categories = c("Africa", "Asia", "Europe", "North <br> America", "Oceania", "South <br> America"), labels = list(style = list(fontSize = 8))) %>%
hc_yAxis(labels = list(style = list(fontSize = 10)),
title = list(text = "Suicides per 100K people",
style = list(fontSize = 10)),
plotLines = list(
list(color = "black", width = 1, dashStyle = "Dash",
value = mean(overall_tibble$suicide_capita),
label = list(text = "Mean = 13.12", style = list(color = "black", fontSize = 6))))) %>%
hc_legend(verticalAlign = 'top', enabled = FALSE) %>%
hc_add_theme(custom_theme)
# Create a tibble for continent and sex.
continent_age_tibble <- data %>%
select(continent, age, suicides_no, population) %>%
group_by(continent, age) %>%
summarize(suicide_capita = round((sum(suicides_no)/sum(population))*100000, 2))
## `summarise()` has grouped output by 'continent'. You can override using the
## `.groups` argument.
# Create histogram of suicides by continent.
highchart() %>%
hc_add_series(continent_age_tibble, hcaes(x = continent, y = suicide_capita, group = age), type = "column") %>%
hc_colors(colors = age_color) %>%
hc_title(text = "Suicides by continent and <b>Age</b>", style = (list(fontSize = '14px'))) %>%
hc_subtitle(text = "1985-2015") %>%
hc_tooltip(borderWidth = 1.5, pointFormat = paste("Age: <b> {point.age} </b> <br> Suicides: <b>{point.y}</b>")) %>%
hc_xAxis(categories = c("Africa", "Asia", "Europe", "North <br> America", "Oceania", "South <br> America"), labels = list(style = list(fontSize = 8))) %>%
hc_yAxis(labels = list(style = list(fontSize = 10)),
title = list(text = "Suicides per 100K people",
style = list(fontSize = 10)),
plotLines = list(
list(color = "black", width = 1, dashStyle = "Dash",
value = mean(overall_tibble$suicide_capita),
label = list(text = "Mean = 13.12", style = list(color = "black", fontSize = 6))))) %>%
hc_legend(verticalAlign = 'top', enabled = FALSE) %>%
hc_add_theme(custom_theme)
# Create tibble for overall suicides by country
country_bar <- data %>%
select(country, suicides_no, population) %>%
group_by(country) %>%
summarise(suicide_capita = round((sum(suicides_no)/sum(population))*100000, 2)) %>%
arrange(desc(suicide_capita))
# Create interactive bar plot
highchart() %>%
hc_add_series(country_bar, hcaes(x = country, y = suicide_capita, color = suicide_capita), type = "bar") %>%
hc_tooltip(borderWidth = 1.5,
pointFormat = paste("Suicides: <b>{point.y}</b>")) %>%
hc_legend(enabled = FALSE) %>%
hc_title(text = "Suicides by country") %>%
hc_subtitle(text = "1985-2015") %>%
hc_xAxis(categories = country_bar$country,
labels = list(step = 1),
min = 0, max = 25,
scrollbar = list(enabled = TRUE)) %>%
hc_yAxis(title = list(text = "Suicides per 100K people")) %>%
hc_plotOptions(bar = list(stacking = "normal",
pointPadding = 0, groupPadding = 0, borderWidth = 0.5)) %>%
hc_add_theme(custom_theme)
It took me a bit of research to learn that I can create an interactive world map by using highcharter and using worldgeojson. I attempted to also try to make a world map with just the continents but I’ve been having trouble completing it.
# Create a tibble with suicide per capita by country for 1985-2015.
country_tibble <- data %>%
select(country, suicides_no, population) %>%
group_by(country) %>%
summarize(suicide_capita = round((sum(suicides_no)/sum(population))*100000, 2))
# Create interactive world map.
highchart() %>%
hc_add_series_map(worldgeojson, country_tibble, value = "suicide_capita", joinBy = c('name','country')) %>%
hc_colorAxis(stops = color_stops()) %>%
hc_title(text = "Suicides by Country") %>%
hc_subtitle(text = "1985-2015") %>%
hc_tooltip(borderWidth = 1.5, headerFormat = "", valueSuffix = " suicides (per 100K people)") %>%
hc_add_theme(custom_theme)
From all the data visuals I gathered, it is very clear that Europe has been suffering from many deaths by suicide. To my surprise, Oceania also has a lot of cases of suicide especially for the young adults that are in their 20s. The world map also clearly shows that this dataset does not have all of the world’s countries suicide rates which makes a lot of sense as to how Africa continent was so low on suicide rates. It can also be noted that the suicide rates that are high is very concentrated in the European countries. Interestingly, it is very high in Lithuania and Hungary. I believe that this may be connected to whatever economical or social injustices or hardships that those countries may have been through. It is also very important to note that the gender that has been suffering from cases of suicide were the males and it was by a lot.
In conclusion, this dataset brings light into where and what groups of people are dealing with suicide the most and would bring attention as to why this would be the case. I feel pretty content with the fact that I was able to use highcharter effectively with a new dataset this time by reviewing the week 8 materials. I wish to learn and use the interactive world map data visualization a little better but happy that I got to do one with the countries at least.