Introduction
1.1 Brief
The following is data from a list of suicides from various countries from 1979-2016. The data used is taken from the Kaggle.com website and can be accessed at this
1.2 Data’s Point of View
With the following data I will present a visualization in the form of a graph / plot with the results, namely:
Graph 1: Cumulative number of suicides from all countries from 1979-2016
Graph 2: Cumulative number of suicides based on gender differences from 1979-2016
Graph 4: Cumulative number of suicides based on age Category from 1979 - 2016
Graph 5: Top 10 countries with the most suicides
Set Up
Load Library
library(tidyverse)
library(ggplot2) # plot
options(scipen = 9999)
options(dplyr.summarise.inform = FALSE)
options(dplyr.selecting.inform = FALSE)Import Data
suicide <- read.csv("data_input/who_suicide_statistics.csv")
glimpse(suicide)## Rows: 43,776
## Columns: 6
## $ country <chr> "Albania", "Albania", "Albania", "Albania", "Albania", "Al~
## $ year <int> 1985, 1985, 1985, 1985, 1985, 1985, 1985, 1985, 1985, 1985~
## $ sex <chr> "female", "female", "female", "female", "female", "female"~
## $ age <chr> "15-24 years", "25-34 years", "35-54 years", "5-14 years",~
## $ suicides_no <int> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA~
## $ population <int> 277900, 246800, 267500, 298300, 138700, 34200, 301400, 264~
There are several columns whose data types do not match what they should be, such as sex, country, age
Data Inspection
- Change Data Type
df <- suicide %>%
mutate_if(is.character,as.factor)- Check missing value
colSums(is.na(suicide))## country year sex age suicides_no population
## 0 0 0 0 2256 5460
There are 2256 blank data and we will fill it with number 0
df1 <- df %>%
select(-population) %>%
mutate(suicides_no =ifelse(is.na(suicides_no), yes = 0, no = suicides_no))Check the missing value again after importing 0
colSums(is.na(df1))## country year sex age suicides_no
## 0 0 0 0 0
And to make things easier I changed the column name suicides_no to suicide
df2 <- df1 %>% rename(.data = .,suicides=suicides_no)
str(df2)## 'data.frame': 43776 obs. of 5 variables:
## $ country : Factor w/ 141 levels "Albania","Anguilla",..: 1 1 1 1 1 1 1 1 1 1 ...
## $ year : int 1985 1985 1985 1985 1985 1985 1985 1985 1985 1985 ...
## $ sex : Factor w/ 2 levels "female","male": 1 1 1 1 1 1 2 2 2 2 ...
## $ age : Factor w/ 6 levels "15-24 years",..: 1 2 3 4 5 6 1 2 3 4 ...
## $ suicides: num 0 0 0 0 0 0 0 0 0 0 ...
Data Explanation
Now i`ll show details from each columns to gain insights from each columns
summary(df2)## country year sex age
## Austria : 456 Min. :1979 female:21888 15-24 years:7296
## Hungary : 456 1st Qu.:1990 male :21888 25-34 years:7296
## Iceland : 456 Median :1999 35-54 years:7296
## Israel : 456 Mean :1999 5-14 years :7296
## Mauritius : 456 3rd Qu.:2007 55-74 years:7296
## Netherlands: 456 Max. :2016 75+ years :7296
## (Other) :41040
## suicides
## Min. : 0.0
## 1st Qu.: 0.0
## Median : 11.0
## Mean : 183.4
## 3rd Qu.: 83.0
## Max. :22338.0
##
Here are the insights we can get after we summarize the data: 1. First case in 1979 2. the rate of suicide was 183.4 3. The highest number of suicides was 22338
Visualization
Suicide cases based on years
df2 %>%
group_by(year) %>%
summarise(kasus = sum(suicides)) %>%
ggplot(aes(x=year,y=kasus,col="red"))+
geom_line(size=1.5)+
labs(title = "Graph Year-Based Suicide",
subtitle = "from 1979 until 2016",
y = "Cumulative Death",
x = "Year")+
theme(legend.position = "none")Sex-based suicides
df2 %>%
group_by(sex,year) %>%
summarise(kasus = sum(suicides)) %>%
ungroup() %>%
ggplot(aes(x=year,y=kasus,col=sex))+
geom_line(size=1.5)+
labs(title = "Graph Sex-Based Suicide",
subtitle = "from 1979 until 2016",
y = "Cumulative Death",
x = "year")+
theme(legend.position = "none")Suicide cases based on gender and age
df2 %>%
group_by(age,sex) %>%
summarise(kasus = sum(suicides)) %>%
ungroup() %>%
ggplot(aes(x=kasus,y=age))+
geom_col(aes(fill=sex),position = "dodge")+
labs(title = "Graph Sex & Age-Based Suicides",
subtitle = "from 1979 until 2016",
x = "sex",
y = "Cumulative Suicides")Suicide cases by Age Category
df2 %>%
group_by(age,year) %>%
summarise(kasus = sum(suicides)) %>%
ungroup() %>%
ggplot(aes(x=year,y=kasus))+
geom_line(aes(col=age),size=1.5)+
labs(title = "Graph age-based suicides",
subtitle = "from 1979 until 2016",
x = "Year",
y = "Suicides")Top 10 countries with total suicides
df2 %>%
group_by(country) %>%
summarise(total = sum(suicides)) %>%
top_n(10) %>%
ggplot(aes(x=total,y=reorder(country,total)))+
geom_col(fill="firebrick")+
labs(title = "Top 10 Country Suicides",
subtitle = "from 1979 until 2016",
x = "Cumulative Death",
y = "Country")+
theme(legend.position = "none")Top 10 countries with total cases of suicide by female gender
df2 %>%
filter(sex %in% "female") %>%
group_by(country) %>%
summarise(total = sum(suicides)) %>%
top_n(10) %>%
ggplot(aes(x=total, y=reorder(country,total), fill="firebrick"))+
geom_col()+
labs(title = "Top 10 Country Woman Suicides",
subtitle = "from 1979 until 2016",
x = "Cumulative Death",
y = "Country")+
theme(legend.position = "none")Top 10 countries with total cases of suicide by male gender
df2 %>%
filter(sex %in% "male") %>%
group_by(country) %>%
summarise(total = sum(suicides)) %>%
top_n(10) %>%
ggplot(aes(x=total, y=reorder(country,total), fill="firebrick"))+
geom_col()+
labs(title = "Top 10 Country Male Suicides",
subtitle = "from 1979 until 2016",
x = "Cumulative Death",
y = "Country")+
theme(legend.position = "none")Summary
From the data that has been processed and visualized, it can be concluded that:
1.The first graph shows an increase in suicides from 1979 to 2016. And in the years between 1983-1984 there was a decrease in suicide cases, and these suicides increased again in 1984
2.The second graph shows that suicides are mostly committed by males than females. female suicides tend to be stable
3.The third graph, based on the graph, it can be concluded that, the highest number of perpetrators of suicide cases are men aged 34-54 years and 55-74 years, while for women also of the same age, so in the same time we it can explain the graph 4.
4.Most suicides occurred in Russia, USA, JAPAN, France and Ukraine
5.Most cases of female suicide occurred in Japan, Russia, USA, France and Korea
6.Most cases of male suicide occur in Russia, USA, JAPAN, UKRAINE and France