library(tidyverse)
library(highcharter)Have you ever know someone that looks happy outside, but actually they felt deepest feeling such as lonely inside? Or do you happens to be a fan of someone, some idol or artist, you only knew they are happy but the next day you found a news that they’re happens to suicide, end their life? Although we don’t always know how we felt or someone we know how they felt, suicide is not and option.
As we know, Mental Illness is one of serious disease that many of us not noticing it at the first place, in fact mental illness could lead someone decide to end their life by suicide. In fact, 95% of people who commit suicide have a mental illness [2]. Not only mental illness, some cases that lead people end their life sometimes because economical background. In 2016, Guyana has the highest suicide rate in the world, most of them caused by economical background [3]. There are many factors, and most likely the factors are like dominoes, related to other aspect that maybe we never think these factors are influenced or even lead to someone’s death.
This data set will tell story of suicide rates from 1985 to 2016 from 101 country. The data set contains GDP per country, sexes that recorded happens to have suicide, and etc. This time, I will try to forecast influence of several factors on to suicides rate and do some clustering. First, we need to assign the data set into suicide and take a peek before go to EDA section.
suicide <- read.csv("master.csv")
head(suicide)glimpse(suicide)## Rows: 27,820
## Columns: 12
## $ ï..country <chr> "Albania", "Albania", "Albania", "Albania", "Alb...
## $ year <int> 1987, 1987, 1987, 1987, 1987, 1987, 1987, 1987, ...
## $ sex <chr> "male", "male", "female", "male", "male", "femal...
## $ age <chr> "15-24 years", "35-54 years", "15-24 years", "75...
## $ suicides_no <int> 21, 16, 14, 1, 9, 1, 6, 4, 1, 0, 0, 0, 2, 17, 1,...
## $ population <int> 312900, 308000, 289700, 21800, 274300, 35600, 27...
## $ suicides.100k.pop <dbl> 6.71, 5.19, 4.83, 4.59, 3.28, 2.81, 2.15, 1.56, ...
## $ country.year <chr> "Albania1987", "Albania1987", "Albania1987", "Al...
## $ HDI.for.year <dbl> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, ...
## $ gdp_for_year.... <chr> "2,156,624,900", "2,156,624,900", "2,156,624,900...
## $ gdp_per_capita.... <int> 796, 796, 796, 796, 796, 796, 796, 796, 796, 796...
## $ generation <chr> "Generation X", "Silent", "Generation X", "G.I. ...
There are 27,820 rows and 12 columns in the data set. This indicates probably there are 27,820 case recorded on this data set, and there were some columns that have incorrect data type and we need to tidy up some of the columns.
Before do some pre-processing, we will verify some columns.
Verify the country
# verify the country
head(data.frame(table(suicide$ï..country)))Verify the year
# verify the year
head(data.frame(table(suicide$year)))Verify the generation
# verify the generation
data.frame(table(suicide$generation))Several tavle above shows that country, year, sex, and generation should change the data type into factor.
suicide <- suicide %>%
mutate(ï..country = as.factor(ï..country),
year = as.factor(year),
sex = as.factor(sex),
age = as.factor(age),
generation = as.factor(generation)) %>%
rename(country = ï..country)Before we go further, let’s check is there any missing value on this data set?
colSums(is.na(suicide))## country year sex age
## 0 0 0 0
## suicides_no population suicides.100k.pop country.year
## 0 0 0 0
## HDI.for.year gdp_for_year.... gdp_per_capita.... generation
## 19456 0 0 0
Most of columns don’t have missing value. Only 1 column that have missing value. For preliminary, we will just let it be. Then, we will move to EDA & Visualization.
There were some category on our data set. Let’s we visualize them.
In this section, we will see graph below any possibilities of suicide rate by Gender.
suicide %>%
group_by(sex) %>%
summarise(total = round(sum(suicides.100k.pop))) %>%
hchart("column",
hcaes(sex, total)) %>%
hc_tooltip(crosshairs = TRUE,
borderWidth = 3.5,
table = TRUE,
headerFormat = "<b>Total Suicide Rate</b>",
pointFormat = paste('<br>{point.sex} : {point.total}')) %>%
hc_title(text = "Total Suicide Rate from 1985 to 2016 by Gender",
style = list(fontWeight = "bold"),
align = "center") %>%
hc_subtitle(text = "per 100.000 Population",
align = "center") %>%
hc_xAxis(title = list(text = "Gender")) %>%
hc_yAxis(title = list(text = "Total Suicide from 1985-2016")) %>%
hc_colors("#a29ecd") %>%
hc_add_theme(hc_theme_ffx())Graph above shows that from 1985 to 2016, gender who likely to end their life is Male with total suicide rate 281.529 person. Female happens to didn’t intend to end their life. Now, we will see suicide rate per year by gender.
suicide %>%
group_by(sex, year) %>%
summarise(total = round(sum(suicides.100k.pop))) %>%
hchart("line",
hcaes(year, total,
group = sex)) %>%
hc_tooltip(crosshairs = TRUE,
borderWidth = 3.5,
table = TRUE) %>%
hc_title(text = "Total Suicide Rate per Year by Gender",
style = list(fontWeight = "bold"),
align = "center") %>%
hc_subtitle(text = "per 100.000 population",
align = "center") %>%
hc_xAxis(title = list(text = "Year")) %>%
hc_yAxis(title = list(text = "")) %>%
hc_add_theme(hc_theme_ffx())Line graph above shows that on 1990 & 1995 the suicide rate on Male has a significant increasing, but on Female there is no a significant increasing. But, both gender has significant decrement on 2014, 2015, & 2016, especially on Male.
On our data set, there are 6 generation recorded; - G.I. Generation = people who born from 1901 - 1927 - Silent Generation = people who born from 1928 - 1945 - Boomers Generation = people who born from 1946 - 1964 - Generation X = people who born from 1965 - 1980 - Millenials = people who born from 1981 - 1996 - Generation Z = people who born from 1997 - 2012
As we know, there might be different generation that recorded happens to end their life by suicide. Now, we will visualize it.
suicide %>%
group_by(generation, sex) %>%
summarise(total = round(sum(suicides.100k.pop))) %>%
arrange(-total) %>%
hchart("column",
hcaes(generation, total,
group = sex),
stacking = "normal") %>%
hc_tooltip(crossHairs = TRUE,
borderWidth = 3.5,
table = TRUE) %>%
hc_title(text = "Total Suicide Rate by Generation From 1985 - 2016",
style = list(fontWeight = "bold"),
align = "center") %>%
hc_subtitle(text = "per 100.000 population",
align = "center") %>%
hc_yAxis(title = list(text = "")) %>%
hc_xAxis(title = list(text = "Generation")) %>%
hc_colors(c("#0e469a",
"#6db6d9")) %>%
hc_legend(enabled = TRUE) %>%
hc_add_theme(hc_theme_ffx())Bar graph above shows, Silent generation has highest suicide rate of all generation. Silent generation (1928-1945) known born when the greatest depression (global economic chaos) happened [4]. This might caused by the global economic chaos itself.
But if we exclude GI generation and Silent generation, Boomers generation has highest suicide rate of all worker generation. As for the Boomers getting old, there are no large-scale studies yet fleshing out the reasons behind the increase in boomer suicide. Part it is likely tied to the recent economic downturn - financial recessions are in general associated with an uptick in suicide [5].
This section, we will visualize total suicide rate by Age and Gender from our data set.
suicide %>%
group_by(age, sex) %>%
summarise(total = round(sum(suicides.100k.pop))) %>%
arrange(total) %>%
hchart("line",
hcaes(age, total,
group = sex)) %>%
hc_tooltip(crosshairs = TRUE,
borderWidth = 3.5,
table = TRUE) %>%
hc_title(text = "Total Suicide Rate per Age by Gender From 1985 - 2016",
style = list(fontWeight = "bold"),
align = "center") %>%
hc_subtitle(text = "per 100.000 population",
align = "center") %>%
hc_xAxis(title = list(text = "Gender")) %>%
hc_yAxis(title = list(text = "")) %>%
hc_add_theme(hc_theme_ffx())Line graph above shows that elder (75+ years) has higher suicide rate than other age categories. Male elder has the highest suicide rate than other categories whether from age categories or gender categories. There are possibilities for elderly to feel loneliness [6].
Now, we will see what happened on World when 1995. On the previous line graph shows that on 1995 there is a significant increasing of suicide rate both on Male and Female.
# load world map
data(worldgeojson, package = "highcharter")
# EDA
world1995 <- suicide %>%
filter(year == "1995") %>%
group_by(country) %>%
summarise(value = round(sum(suicides.100k.pop)))
# Visualization
highchart() %>%
hc_add_series_map(worldgeojson,
world1995,
value = "value",
joinBy = c("name", "country")) %>%
hc_colorAxis(stops = color_stops()) %>%
hc_title(text = "Suicide Rate World Map on 1995",
align = "center",
style = list(fontWeight = "bold")) %>%
hc_subtitle(text = "per 100.000 population") %>%
hc_tooltip(crosshairs = TRUE,
borderWidth = 3.5,
sort = TRUE,
shared = TRUE,
table = TRUE,
pointFormat = paste('<br> Total Suicide Rate:<b> {point.value}</b>'))World map above shows that there are several country has higher suicide rate than other countries, Lithuania has the highest suicide rate on 1995. We will see it by a bar graph below.
world1995 %>%
arrange(-value) %>%
head(10) %>%
hchart("column",
hcaes(country, value)) %>%
hc_tooltip(crosshairs = TRUE,
borderWidth = 3.5,
pointFormat = paste('<br>Total Suicide Rate : <b>{point.value}</b>')) %>%
hc_title(text = "Top 10 Highest Suicide Rate per Countries on 1995",
style = list(fontWeight = "bold"),
align = "center") %>%
hc_subtitle(text = "per 100.000 population",
align = "center") %>%
hc_yAxis(title = list(text = "")) %>%
hc_xAxis(title = list(text = "Country")) %>%
hc_add_theme(hc_theme_ffx()) %>%
hc_colorAxis(minColor = "#e86662",
maxColor = "#0e469a")Bar graph above shows that;
So, there are several conclusion that we got after took a glimpse on several visualization;
Based on the first visualization, Male has higher chance to suicide than Female. There might be few insight we got, such as traditional male gender roles discourage emotional expression. Men are told they need to be tough and that they should not need to ask for help. Such rigid gender norms may take it difficult for men to reach out and ask for support when they need it [8].
From our data set, the peak of highest suicide rate happened on 1990 and 1995 and start from 2014 suicide rate on 101 countries are decreased. Second visualization shows that there are huge difference of suicide rate on Male than Female. Suicide Rate of female tends to stagnant.
From 1985 - 2016, Silent Generation (people who born from 1928 - 1945) has the highest suicide rate of all generation on our data set. There are few reasons why Silent Generation has highest suicide rate, born on The Greatest Depression (global economic chaos) might be one of them.
This data set records that elderly tends to have highest suicide rate of all age categories. Loneliness could led elderly to decide to end their life. Again, male elderly tends to end their life by suicide than female elderly.
After take a glimpse on world map suicide rate on 1995, we got several countries tends have higher suicide rate than other countries. Those numbers are accumulative of gender and age categories. After we sort in to tp 10 countries has highest suicide rate on 1995, Lithuania has the highest suicide rate of all countries with nearly 700 of 100.000 people.
End of words, there are many factors why people end their life by suicide. Economic factors tends led someone to suicide than other factors. But once again, The factor or thoughs about suicide are like dominoes. Mental illness, access to lethal chemicals, alcohol misuse, interpersonal violence, family dysfunction, and insufficient mental health resourced as key factorrs that could lead someone have thoughs about or even do suicide [9].
[1] https://www.kaggle.com/russellyates88/suicide-rates-overview-1985-to-2016
[3] https://www.nami.org/About-Mental-Illness/Common-with-Mental-Illness/Risk-of-Suicide
[6] https://www.psychologytoday.com/us/blog/understanding-grief/202001/why-do-the-elderly-commit-suicide
[7] https://en.wikipedia.org/wiki/Suicide_in_Lithuania