In 2021, the World Health Organization (WHO) had published a fact-sheet article regarding the global suicide overview on its official website. According to its report, an estimate of 703,000 people across the globe had taken their lives every year and even more had attempted to do so. Suicide has become a major public health problem and although it is commonly found in in the low and middle income countries, it is a phenomenon that also occurred in higher income countries. Eastern Europe for example, becomes the region with the highest number of suicide rate both for men and women alike and through this report, I aim to analyse the global suicide rates in 100 countries using the “Suicide Rates Overview 1985 to 2016” dataset that I obtained from Kaggle. I also tried to limit the analysis to 10 years only, starting from the year 1995-2015. Several variables such as GDP per capita will be excluded from the analysis since I will only focused on the trends of suicide rates in each country.
library(lubridate)
library(dplyr)
library(ggplot2)
library(stringr)
library(scales)
library(plotly)
library(glue)suicide <- read.csv('Suicide Rates Overview 1985 to 2016.csv')
head(suicide)Columns description:
Check dimensions.
dim(suicide)## [1] 27820 12
We have 27820 observations and 12 columns.
Check data types for all columns.
str(suicide)## 'data.frame': 27820 obs. of 12 variables:
## $ ï..country : chr "Albania" "Albania" "Albania" "Albania" ...
## $ year : int 1987 1987 1987 1987 1987 1987 1987 1987 1987 1987 ...
## $ sex : chr "male" "male" "female" "male" ...
## $ age : chr "15-24 years" "35-54 years" "15-24 years" "75+ years" ...
## $ suicides_no : int 21 16 14 1 9 1 6 4 1 0 ...
## $ population : int 312900 308000 289700 21800 274300 35600 278800 257200 137500 311000 ...
## $ suicides.100k.pop : num 6.71 5.19 4.83 4.59 3.28 2.81 2.15 1.56 0.73 0 ...
## $ country.year : chr "Albania1987" "Albania1987" "Albania1987" "Albania1987" ...
## $ HDI.for.year : num NA NA NA NA NA NA NA NA NA NA ...
## $ gdp_for_year.... : chr "2,156,624,900" "2,156,624,900" "2,156,624,900" "2,156,624,900" ...
## $ gdp_per_capita....: int 796 796 796 796 796 796 796 796 796 796 ...
## $ generation : chr "Generation X" "Silent" "Generation X" "G.I. Generation" ...
suicide$gdp_for_year.... <- str_replace_all(suicide$gdp_for_year...., ",",'') suicide <- suicide %>%
rename(country = ï..country,
gdp.for.year = gdp_for_year....,
gdp.per.capita = gdp_per_capita....) %>%
mutate_at(.vars = c("country","sex", "age", "generation"), .funs=as.factor) %>%
mutate(gdp.for.year = as.numeric(gdp.for.year),
year = as.character(year),
year = as.Date(year, "%Y"),
year = year(year))
str(suicide)## 'data.frame': 27820 obs. of 12 variables:
## $ country : Factor w/ 101 levels "Albania","Antigua and Barbuda",..: 1 1 1 1 1 1 1 1 1 1 ...
## $ year : num 1987 1987 1987 1987 1987 ...
## $ sex : Factor w/ 2 levels "female","male": 2 2 1 2 2 1 1 1 2 1 ...
## $ age : Factor w/ 6 levels "15-24 years",..: 1 3 1 6 2 6 3 2 5 4 ...
## $ suicides_no : int 21 16 14 1 9 1 6 4 1 0 ...
## $ population : int 312900 308000 289700 21800 274300 35600 278800 257200 137500 311000 ...
## $ suicides.100k.pop: num 6.71 5.19 4.83 4.59 3.28 2.81 2.15 1.56 0.73 0 ...
## $ country.year : chr "Albania1987" "Albania1987" "Albania1987" "Albania1987" ...
## $ HDI.for.year : num NA NA NA NA NA NA NA NA NA NA ...
## $ gdp.for.year : num 2.16e+09 2.16e+09 2.16e+09 2.16e+09 2.16e+09 ...
## $ gdp.per.capita : int 796 796 796 796 796 796 796 796 796 796 ...
## $ generation : Factor w/ 6 levels "Boomers","G.I. Generation",..: 3 6 3 2 1 2 6 1 2 3 ...
Now, every columns are stored with the correct data types.
head(suicide)Check for missing values.
colSums(is.na(suicide))## country year sex age
## 0 0 0 0
## suicides_no population suicides.100k.pop country.year
## 0 0 0 0
## HDI.for.year gdp.for.year gdp.per.capita generation
## 19456 0 0 0
suicide <- subset(suicide, select = -c(HDI.for.year))
suicideOur data contains information that is dated back to 1987. However, our main interest here is the information from 1995-2015 only. Therefore, we will need to subset suicide cases from the last ten years starting from 1995.
decade <- suicide %>%
filter(year >= 1995,
suicide$year <= 2015)
decadeSeveral research questions that we want to answer:
Numbers of suicide cases in the last ten years based on sex
sex <- decade %>%
group_by(year, sex) %>%
summarise(suicides_no = sum(suicides_no))
sexline = c("grey", "firebrick")
sex_plot <- ggplot(sex, aes(
x = year,
y = suicides_no,
color= sex,
group = sex,
text = glue("Sex: {sex}
Year: {year}
Total of suicides: {comma(suicides_no)}"))) +
geom_line(size=1) +
scale_color_manual(values = line) +
labs(x = NULL,
y = NULL,
title = 'Total of suicides based on sex') +
scale_y_continuous(label = comma_format()) +
theme_minimal()
sex_plotRegardless of which year we’re looking at, the number of suicides committed by male remains about three times higher compared to female. The range number of female that died from suicides were around 40,000 - 58,000 while there were roughly 150,000 - 200,000 male that committed suicide each year.
# numbers of suicide cases in the last ten years based on generations
gen <- decade %>%
group_by(generation) %>%
summarise(sum(suicides_no)) %>%
rename('suicides_total' = 'sum(suicides_no)') %>%
arrange(desc(suicides_total))
gengen_plot <- gen %>%
ggplot(aes(x = reorder(generation, -suicides_total), y = suicides_total)) +
geom_col(fill='firebrick') +
labs(x = NULL,
y = NULL,
title = 'Total of suicides based on generations',
subtitle = 'from 1995-2015') +
scale_y_continuous(label = comma_format()) +
theme_minimal()
gen_plotTop 5 countries with the highest numbers of suicide cases in the last ten years (not based on the population).
country_sn <- decade %>%
group_by(country) %>%
summarise(sum(suicides_no)) %>%
rename('suicides_total' = 'sum(suicides_no)') %>%
arrange(desc(suicides_total)) %>%
top_n(5)## Selecting by suicides_total
country_sncountry_sn_plot <- country_sn %>%
ggplot(aes(x = reorder(country, -suicides_total),
y = suicides_total,
text = glue("Country:{country}
Total suicides: {comma(suicides_total)}"))) +
geom_col(fill='firebrick') +
labs(x = NULL,
y = NULL,
title = 'Countries with the highest number of suicides',
subtitle = 'from 1995-2015') +
scale_y_continuous(label = comma_format()) +
theme_minimal()
country_sn_plotRussia, US, Japan, South Korea and Ukraine are the top 5 countries that has the highest number of suicide cases within the last ten years. Now we want to see if those same countries also exist in the top 20 countries with the highest suicide rates, taking into consideration the ratio between suicides rates and its total of population.
The plot above only measures the global total suicides between 1995-2015 and it doesn’t take into account the difference of population size in each country. This can be problematic as countries with denser population will likely to have higher total of suicide compared with those that have smaller population size. That is why, in the next section we will divided the total of suicides that occurred in a country with its own population size.
# aggregate countries name with their suicides rates and country's population
country_rat <- decade %>%
group_by(country) %>%
summarise(total_suicides = sum(suicides_no),
pop = sum(population)) %>%
mutate(ratio = (total_suicides / pop)*100) %>%
arrange(desc(ratio)) %>%
top_n(20)## Selecting by ratio
country_ratcountry_rat %>%
ggplot(aes(x = ratio,
y = reorder(country, ratio))) +
geom_col(aes(fill = ratio)) +
scale_fill_gradient(low = 'darkgrey', high = 'firebrick') +
labs(x = NULL,
y = NULL,
title = 'Suicides ratio per population') +
theme_minimal()According to the barplot above, among the top five countries that has the highest number of suicide cases, four of them; Russia, South Korea, Japan, and Ukraine still ranked quite high when we included the ratio between suicides number and the country’s population. They all exist in the top 20 countries with the most suicide cases/population. United States is the only country that didn’t make to the top 20 countries. Next, we’ll inspect more the data from those five countries.
# subset data from the top 5 countries
top_five <- decade %>%
filter(country %in% c('Russian Federation', 'Japan', 'Ukraine', 'United States', 'Republic of Korea'))
top_fivelevels(top_five$country)## [1] "Albania" "Antigua and Barbuda"
## [3] "Argentina" "Armenia"
## [5] "Aruba" "Australia"
## [7] "Austria" "Azerbaijan"
## [9] "Bahamas" "Bahrain"
## [11] "Barbados" "Belarus"
## [13] "Belgium" "Belize"
## [15] "Bosnia and Herzegovina" "Brazil"
## [17] "Bulgaria" "Cabo Verde"
## [19] "Canada" "Chile"
## [21] "Colombia" "Costa Rica"
## [23] "Croatia" "Cuba"
## [25] "Cyprus" "Czech Republic"
## [27] "Denmark" "Dominica"
## [29] "Ecuador" "El Salvador"
## [31] "Estonia" "Fiji"
## [33] "Finland" "France"
## [35] "Georgia" "Germany"
## [37] "Greece" "Grenada"
## [39] "Guatemala" "Guyana"
## [41] "Hungary" "Iceland"
## [43] "Ireland" "Israel"
## [45] "Italy" "Jamaica"
## [47] "Japan" "Kazakhstan"
## [49] "Kiribati" "Kuwait"
## [51] "Kyrgyzstan" "Latvia"
## [53] "Lithuania" "Luxembourg"
## [55] "Macau" "Maldives"
## [57] "Malta" "Mauritius"
## [59] "Mexico" "Mongolia"
## [61] "Montenegro" "Netherlands"
## [63] "New Zealand" "Nicaragua"
## [65] "Norway" "Oman"
## [67] "Panama" "Paraguay"
## [69] "Philippines" "Poland"
## [71] "Portugal" "Puerto Rico"
## [73] "Qatar" "Republic of Korea"
## [75] "Romania" "Russian Federation"
## [77] "Saint Kitts and Nevis" "Saint Lucia"
## [79] "Saint Vincent and Grenadines" "San Marino"
## [81] "Serbia" "Seychelles"
## [83] "Singapore" "Slovakia"
## [85] "Slovenia" "South Africa"
## [87] "Spain" "Sri Lanka"
## [89] "Suriname" "Sweden"
## [91] "Switzerland" "Thailand"
## [93] "Trinidad and Tobago" "Turkey"
## [95] "Turkmenistan" "Ukraine"
## [97] "United Arab Emirates" "United Kingdom"
## [99] "United States" "Uruguay"
## [101] "Uzbekistan"
Okay, so i just subset my data frame (decade) to a new dataframe (top_five) and it was supposed to consisted of five countries only. But turns out, all of the countries from our previous list are still exist. Now i’ll removed the unused levels from our top_five using droplevels().
top_five$country <- droplevels(top_five$country)
levels(top_five$country)## [1] "Japan" "Republic of Korea" "Russian Federation"
## [4] "Ukraine" "United States"
Good. Now there’s only five levels of countries in our top_five dataframe. Next, let’s find out the ratio between number of suicide cases found and the population for each countries from 1995-2015.
top_five_trend <- top_five %>%
group_by(country, year) %>%
summarise(suicide_num = sum(suicides_no),
population = sum(population),
ratio = (suicide_num / population)*100)
top_five_trend# highlight any country that has an increasing number of suicide rates within the last ten years
cols <- c("gray", "firebrick", "grey", "grey", "grey")
five_trend_plot <- top_five_trend %>%
ggplot(aes(x = year,
y = ratio,
color = country,
group = country)) +
geom_line(aes(color = country), size = 1) +
scale_color_manual(values = cols) +
labs(title = "Suicide ratio trends in five countries",
x = NULL,
y = NULL) +
theme_minimal()
five_trend_plotFrom the line chart above we can see the trends of suicide rates in each of the five countries from 1995-2015.The country that I want to highlight here is South Korea. There has been a rise of suicides rates in the Republic of Korea or South Korea and it is slightly more fluctuate compared to the other top 5 countries. Suicides number in the other four countries witness a general decline or going on stagnant over the last ten years. On the contrary, the number of suicides rates in South Korea (Republic of Korea) has increased.The number of suicides rates per population in 2009 is three times higher than when in 1995.
In general, the ratio of suicide rates per population in the top 5 countries are declining, except in South Korea.
By highlighting a single line in our chart, it is much easier for our audience to focused their attention on which country that is in contrast with the general trends. In this case, South Korea is the only country that has an increasing ratio of suicide rates while the other four countries have an improvement in pressing down the ratio of suicides rates.
It’s also interesting to know that South Korea begins as the country with the lowest ratio of suicide rates in 1995 but it ranked first as the country with the highest ratio of suicide cases per population within a span of ten years.
single_country <- decade %>%
filter(country == c("Russian Federation","Republic of Korea")) %>%
group_by(country, year) %>%
summarise(total_suicides = sum(suicides_no),
pop = sum(population),
ratio = (total_suicides / pop)*100)## `summarise()` has grouped output by 'country'. You can override using the `.groups` argument.
single_countrycols <- c("firebrick","grey")
single_country %>%
ggplot(aes(x = year,
y = ratio,
color = country,
group = country)) +
geom_line(aes(color = country), size = 1) +
scale_color_manual(values = cols) +
labs(title = "Comparison of suicide ratio trends",
subtitle = "In South Korea & Russia between 1995-2015",
x = NULL,
y = NULL) +
theme_minimal()The trend seems to be reversed in the two countries.