Suicidal Data Analysis with Visualization

Dedy Gusnadi Sianipar

3/30/2021

Set Up

Load Library

library(tidyverse)
library(ggplot2) # plot
options(scipen = 9999)
options(dplyr.summarise.inform = FALSE)
options(dplyr.selecting.inform = FALSE)

Import Data

suicide <- read.csv("data_input/who_suicide_statistics.csv")
glimpse(suicide)
## Rows: 43,776
## Columns: 6
## $ country     <chr> "Albania", "Albania", "Albania", "Albania", "Albania", "Al~
## $ year        <int> 1985, 1985, 1985, 1985, 1985, 1985, 1985, 1985, 1985, 1985~
## $ sex         <chr> "female", "female", "female", "female", "female", "female"~
## $ age         <chr> "15-24 years", "25-34 years", "35-54 years", "5-14 years",~
## $ suicides_no <int> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA~
## $ population  <int> 277900, 246800, 267500, 298300, 138700, 34200, 301400, 264~

There are several columns whose data types do not match what they should be, such as sex, country, age

Data Inspection

  1. Change Data Type
df <- suicide %>%
  mutate_if(is.character,as.factor)
  1. Check missing value
colSums(is.na(suicide))
##     country        year         sex         age suicides_no  population 
##           0           0           0           0        2256        5460

There are 2256 blank data and we will fill it with number 0

df1 <- df %>% 
  select(-population) %>% 
  mutate(suicides_no =ifelse(is.na(suicides_no), yes = 0, no = suicides_no))

Check the missing value again after importing 0

colSums(is.na(df1))
##     country        year         sex         age suicides_no 
##           0           0           0           0           0

And to make things easier I changed the column name suicides_no to suicide

df2 <- df1 %>% rename(.data = .,suicides=suicides_no)
str(df2)
## 'data.frame':    43776 obs. of  5 variables:
##  $ country : Factor w/ 141 levels "Albania","Anguilla",..: 1 1 1 1 1 1 1 1 1 1 ...
##  $ year    : int  1985 1985 1985 1985 1985 1985 1985 1985 1985 1985 ...
##  $ sex     : Factor w/ 2 levels "female","male": 1 1 1 1 1 1 2 2 2 2 ...
##  $ age     : Factor w/ 6 levels "15-24 years",..: 1 2 3 4 5 6 1 2 3 4 ...
##  $ suicides: num  0 0 0 0 0 0 0 0 0 0 ...

Data Explanation

Now i`ll show details from each columns to gain insights from each columns

summary(df2)
##         country           year          sex                 age      
##  Austria    :  456   Min.   :1979   female:21888   15-24 years:7296  
##  Hungary    :  456   1st Qu.:1990   male  :21888   25-34 years:7296  
##  Iceland    :  456   Median :1999                  35-54 years:7296  
##  Israel     :  456   Mean   :1999                  5-14 years :7296  
##  Mauritius  :  456   3rd Qu.:2007                  55-74 years:7296  
##  Netherlands:  456   Max.   :2016                  75+ years  :7296  
##  (Other)    :41040                                                   
##     suicides      
##  Min.   :    0.0  
##  1st Qu.:    0.0  
##  Median :   11.0  
##  Mean   :  183.4  
##  3rd Qu.:   83.0  
##  Max.   :22338.0  
## 

Here are the insights we can get after we summarize the data: 1. First case in 1979 2. the rate of suicide was 183.4 3. The highest number of suicides was 22338

Visualization

Suicide cases based on years

df2 %>% 
  group_by(year) %>%
  summarise(kasus = sum(suicides)) %>% 
  ggplot(aes(x=year,y=kasus,col="red"))+
  geom_line(size=1.5)+
  labs(title = "Graph Year-Based Suicide",
        subtitle = "from 1979 until 2016",
       y = "Cumulative Death",
       x = "Year")+
  theme(legend.position = "none")

Sex-based suicides

df2 %>% 
  group_by(sex,year) %>% 
  summarise(kasus = sum(suicides)) %>% 
  ungroup() %>% 
  ggplot(aes(x=year,y=kasus,col=sex))+
  geom_line(size=1.5)+
  labs(title = "Graph Sex-Based Suicide",
        subtitle = "from 1979 until 2016",
       y = "Cumulative Death",
       x = "year")+
  theme(legend.position = "none")

Suicide cases based on gender and age

df2 %>% 
  group_by(age,sex) %>% 
  summarise(kasus = sum(suicides)) %>% 
  ungroup() %>%
  ggplot(aes(x=kasus,y=age))+
  geom_col(aes(fill=sex),position = "dodge")+
  labs(title = "Graph Sex & Age-Based Suicides",
        subtitle = "from 1979 until 2016",
       x = "sex",
       y = "Cumulative Suicides")

Suicide cases by Age Category

df2 %>% 
  group_by(age,year) %>% 
  summarise(kasus = sum(suicides)) %>% 
  ungroup() %>%
  ggplot(aes(x=year,y=kasus))+
  geom_line(aes(col=age),size=1.5)+
   labs(title = "Graph age-based suicides",
       subtitle = "from 1979 until 2016",
       x = "Year",
       y = "Suicides")

Top 10 countries with total suicides

df2 %>% 
  group_by(country) %>% 
  summarise(total = sum(suicides)) %>% 
  top_n(10) %>% 
  ggplot(aes(x=total,y=reorder(country,total)))+
  geom_col(fill="firebrick")+
   labs(title = "Top 10 Country Suicides",
        subtitle = "from 1979 until 2016",
       x = "Cumulative Death",
       y = "Country")+
  theme(legend.position = "none")

Top 10 countries with total cases of suicide by female gender

df2 %>%
  filter(sex %in% "female") %>% 
  group_by(country) %>% 
  summarise(total = sum(suicides)) %>% 
  top_n(10) %>% 
  ggplot(aes(x=total, y=reorder(country,total), fill="firebrick"))+
  geom_col()+
  labs(title = "Top 10 Country Woman Suicides",
       subtitle = "from 1979 until 2016",
       x = "Cumulative Death",
       y = "Country")+
  theme(legend.position = "none")

Top 10 countries with total cases of suicide by male gender

df2 %>%
  filter(sex %in% "male") %>% 
  group_by(country) %>% 
  summarise(total = sum(suicides)) %>% 
  top_n(10) %>% 
  ggplot(aes(x=total, y=reorder(country,total), fill="firebrick"))+
  geom_col()+
  labs(title = "Top 10 Country Male Suicides",
       subtitle = "from 1979 until 2016",
       x = "Cumulative Death",
       y = "Country")+
  theme(legend.position = "none")

Summary

From the data that has been processed and visualized, it can be concluded that:

1.The first graph shows an increase in suicides from 1979 to 2016. And in the years between 1983-1984 there was a decrease in suicide cases, and these suicides increased again in 1984

2.The second graph shows that suicides are mostly committed by males than females. female suicides tend to be stable

3.The third graph, based on the graph, it can be concluded that, the highest number of perpetrators of suicide cases are men aged 34-54 years and 55-74 years, while for women also of the same age, so in the same time we it can explain the graph 4.

4.Most suicides occurred in Russia, USA, JAPAN, France and Ukraine

5.Most cases of female suicide occurred in Japan, Russia, USA, France and Korea

6.Most cases of male suicide occur in Russia, USA, JAPAN, UKRAINE and France