Overview

What is Suicide ?

Suicide is death caused by injuring oneself with the intent to die. A suicide attempt is when someone harms themselves with the intent to end their life, but they do not die as a result of their actions. Several factors can increase the risk for suicide and protect against it. Suicide is connected to other forms of injury and violence, and causes serious health and economic consequences.

About This Project

In this small project, we will explore dataset about suicides rate from 1985 to 2016. This dataset is contains several variabels or columns that will be used as a reference in the process of extracting valuable information.

Objectives

Unfortunately, not all information in the dataset will be explored in this project. However, we will try to set up simple information goals that will be the basis for working on this project. These objectives are :

  1. Knowing the age group, gender, generation and year with the highest suicides in the world.
  2. Understanding simple statistics information from the age group, gender, generation and year with the highest suicides in the world.
  3. Knowing the country with the highest suicides.
  4. Knowing suicides by the age group, gender, generation and year from countries with the most suicides.
  5. Understanding simple statistics information from countries with the most suicides.

Input Data

Data Inspection

# Read data
data <- read.csv("suicide_rate_overview_1985to2016.csv", sep =",")
data
dim(data)
## [1] 27820    12
names(data)
##  [1] "ï..country"         "year"               "sex"               
##  [4] "age"                "suicides_no"        "population"        
##  [7] "suicides.100k.pop"  "country.year"       "HDI.for.year"      
## [10] "gdp_for_year...."   "gdp_per_capita...." "generation"

From our inspection we can conclude :

  1. This dataset contain 27820 of rows and 12 of coloumns
  2. Each of column name : “ï..country”, “year”, “sex”, “age”, “suicides_no”, “population”, “suicides.100k.pop”, “country.year”, “HDI.for.year”, “gdp_for_year….”, “gdp_per_capita….” and “generation”.

Data Cleansing dan Coertions

Looking for Missing Value in each columns

colSums(is.na(data))
##         ï..country               year                sex                age 
##                  0                  0                  0                  0 
##        suicides_no         population  suicides.100k.pop       country.year 
##                  0                  0                  0                  0 
##       HDI.for.year   gdp_for_year.... gdp_per_capita....         generation 
##              19456                  0                  0                  0
mean(is.na(data$HDI.for.year))*100
## [1] 69.9353

From NA checking above, HDI.for.year column has more than 50% missing value. So, drop columns with more 50% missing values using package dplyr.

library(dplyr)
data <- data %>% 
  select(-c(HDI.for.year, country.year))

head(data)

country.year columns is not contain valuable information because it was represented by year and country columns.

Check the data type for each columns

str(data)
## 'data.frame':    27820 obs. of  10 variables:
##  $ ï..country        : chr  "Albania" "Albania" "Albania" "Albania" ...
##  $ year              : int  1987 1987 1987 1987 1987 1987 1987 1987 1987 1987 ...
##  $ sex               : chr  "male" "male" "female" "male" ...
##  $ age               : chr  "15-24 years" "35-54 years" "15-24 years" "75+ years" ...
##  $ suicides_no       : int  21 16 14 1 9 1 6 4 1 0 ...
##  $ population        : int  312900 308000 289700 21800 274300 35600 278800 257200 137500 311000 ...
##  $ suicides.100k.pop : num  6.71 5.19 4.83 4.59 3.28 2.81 2.15 1.56 0.73 0 ...
##  $ gdp_for_year....  : chr  "2,156,624,900" "2,156,624,900" "2,156,624,900" "2,156,624,900" ...
##  $ gdp_per_capita....: int  796 796 796 796 796 796 796 796 796 796 ...
##  $ generation        : chr  "Generation X" "Silent" "Generation X" "G.I. Generation" ...

Convert data type and name into correct ways

data <- data %>% 
  rename("country" = "ï..country",
         "suicides(/100k.pop)" = "suicides.100k.pop",
         "gdp.peryear($)" = "gdp_for_year....",
         "gdp.percapita($)" = "gdp_per_capita....") %>% 
  mutate_at(c("country","sex", "age", "generation"), as.factor) %>%
  # Adjust factor levels in "generation" columns
  mutate(generation = factor(generation, levels = c("G.I. Generation", 
                                                    "Silent",
                                                    "Boomers", 
                                                    "Generation X", 
                                                    "Millenials",
                                                    "Generation Z"))) %>%
  # Adjust factor levels in "age" columns
  mutate(age = factor(age, levels = c("5-14 years",
                                      "15-24 years",
                                      "25-34 years",
                                      "35-54 years",
                                      "55-74 years",
                                      "75+ years"))) %>% 
  mutate(`gdp.peryear($)` = as.numeric(gsub(",","", `gdp.peryear($)`,fixed = T)))
levels(data$age)
## [1] "5-14 years"  "15-24 years" "25-34 years" "35-54 years" "55-74 years"
## [6] "75+ years"
levels(data$generation)
## [1] "G.I. Generation" "Silent"          "Boomers"         "Generation X"   
## [5] "Millenials"      "Generation Z"

American Generations Timeline

Though there is a consensus on the general time period for generations, there is not an agreement on the exact year that each generation begins and ends.

1. GI Generation
Born 1901-1924 (Age 90+)
*They were teenagers during the Great Depression and fought in World War II. Sometimes called the greatest generation (following a book by journalist Tom Brokaw) or the swing generation because of their jazz music.

2. Silent Generation
Born 1925-1942 (Age 72-89)
*They were too young to see action in World War II and too old to participate in the fun of the Summer of Love. This label describes their conformist tendencies and belief that following the rules was a sure ticket to success.

3. Baby Boomers
Born 1943-1964 (Age 50-71)
*The boomers were born during an economic and baby boom following World War II. These hippie kids protested against the Vietnam War and participated in the civil rights movement, all with rock ‘n’ roll music blaring in the background.

4. Generation X
Born 1965-1979 (Age 35-49)
*They were originally called the baby busters because fertility rates fell after the boomers. As teenagers, they experienced the AIDs epidemic and the fall of the Berlin Wall. Sometimes called the MTV Generation, the “X” in their name refers to this generation’s desire not to be defined.

5. Millennials
Born 1980-2000 (Age 14-34)
*They experienced the rise of the Internet, Sept. 11 and the wars that followed. Sometimes called Generation Y. Because of their dependence on technology, they are said to be entitled and narcissistic.

6. Generation Z
Born 2001-2013 (Age 1-13)
*These kids were the first born with the Internet and are suspected to be the most individualistic and technology-dependent generation. Sometimes referred to as the iGeneration.

source : https://www.npr.org/

Filter data with low information

In this case, data in 2016 has low information. So, we will drop the all data in 2016.

data %>%  
  group_by(year) %>% 
  summarise(n = n())
data <- data %>% 
  filter(year != 2016)

Summary

Statistic

summary(data)
##       country           year          sex                 age      
##  Argentina:  372   Min.   :1985   female:13830   5-14 years :4610  
##  Austria  :  372   1st Qu.:1994   male  :13830   15-24 years:4610  
##  Belgium  :  372   Median :2002                  25-34 years:4610  
##  Brazil   :  372   Mean   :2001                  35-54 years:4610  
##  Chile    :  372   3rd Qu.:2008                  55-74 years:4610  
##  Colombia :  372   Max.   :2015                  75+ years  :4610  
##  (Other)  :25428                                                   
##   suicides_no        population       suicides(/100k.pop) gdp.peryear($)     
##  Min.   :    0.0   Min.   :     278   Min.   :  0.00      Min.   :4.692e+07  
##  1st Qu.:    3.0   1st Qu.:   97535   1st Qu.:  0.91      1st Qu.:8.976e+09  
##  Median :   25.0   Median :  430725   Median :  5.98      Median :4.801e+10  
##  Mean   :  243.4   Mean   : 1850689   Mean   : 12.81      Mean   :4.471e+11  
##  3rd Qu.:  132.0   3rd Qu.: 1491041   3rd Qu.: 16.60      3rd Qu.:2.602e+11  
##  Max.   :22338.0   Max.   :43805214   Max.   :224.97      Max.   :1.812e+13  
##                                                                              
##  gdp.percapita($)           generation  
##  Min.   :   251   G.I. Generation:2744  
##  1st Qu.:  3436   Silent         :6332  
##  Median :  9283   Boomers        :4958  
##  Mean   : 16816   Generation X   :6376  
##  3rd Qu.: 24796   Millenials     :5780  
##  Max.   :126352   Generation Z   :1470  
## 

Data structure

str(data)
## 'data.frame':    27660 obs. of  10 variables:
##  $ country            : Factor w/ 101 levels "Albania","Antigua and Barbuda",..: 1 1 1 1 1 1 1 1 1 1 ...
##  $ year               : int  1987 1987 1987 1987 1987 1987 1987 1987 1987 1987 ...
##  $ sex                : Factor w/ 2 levels "female","male": 2 2 1 2 2 1 1 1 2 1 ...
##  $ age                : Factor w/ 6 levels "5-14 years","15-24 years",..: 2 4 2 6 3 6 4 3 5 1 ...
##  $ suicides_no        : int  21 16 14 1 9 1 6 4 1 0 ...
##  $ population         : int  312900 308000 289700 21800 274300 35600 278800 257200 137500 311000 ...
##  $ suicides(/100k.pop): num  6.71 5.19 4.83 4.59 3.28 2.81 2.15 1.56 0.73 0 ...
##  $ gdp.peryear($)     : num  2.16e+09 2.16e+09 2.16e+09 2.16e+09 2.16e+09 ...
##  $ gdp.percapita($)   : int  796 796 796 796 796 796 796 796 796 796 ...
##  $ generation         : Factor w/ 6 levels "G.I. Generation",..: 4 2 4 1 3 1 2 3 1 4 ...

Variable explanation

  1. country : The name of the country that listed in the dataset.
  2. year (1985-2016) :
  3. sex : Gender (male and female).
  4. age : Age data which is formed into age groups (factor).
  5. suicide_no : Incident case number.
  6. population : A group of individuals of the same species living and interbreeding within a given area. In this data, population depends on country, year, sex, age and generation.
  7. suicides(/100k.pop) : Number of suicides per 100,000 of population.
  8. gdp.peryear($) : The total monetary or market value of all the finished goods and services produced within a country’s borders in a specific time period (based on year).
  9. gdp.percapita($) : A measure of a country’s economic output that accounts for its number of people. It divides the country’s gross domestic product by its total population.
  10. generation : All of the people born and living at about the same time, regarded collectively.

Country list

# number of countries
length(unique(data$country))
## [1] 100
# country list
unique(data$country)
##   [1] Albania                      Antigua and Barbuda         
##   [3] Argentina                    Armenia                     
##   [5] Aruba                        Australia                   
##   [7] Austria                      Azerbaijan                  
##   [9] Bahamas                      Bahrain                     
##  [11] Barbados                     Belarus                     
##  [13] Belgium                      Belize                      
##  [15] Bosnia and Herzegovina       Brazil                      
##  [17] Bulgaria                     Cabo Verde                  
##  [19] Canada                       Chile                       
##  [21] Colombia                     Costa Rica                  
##  [23] Croatia                      Cuba                        
##  [25] Cyprus                       Czech Republic              
##  [27] Denmark                      Dominica                    
##  [29] Ecuador                      El Salvador                 
##  [31] Estonia                      Fiji                        
##  [33] Finland                      France                      
##  [35] Georgia                      Germany                     
##  [37] Greece                       Grenada                     
##  [39] Guatemala                    Guyana                      
##  [41] Hungary                      Iceland                     
##  [43] Ireland                      Israel                      
##  [45] Italy                        Jamaica                     
##  [47] Japan                        Kazakhstan                  
##  [49] Kiribati                     Kuwait                      
##  [51] Kyrgyzstan                   Latvia                      
##  [53] Lithuania                    Luxembourg                  
##  [55] Macau                        Maldives                    
##  [57] Malta                        Mauritius                   
##  [59] Mexico                       Montenegro                  
##  [61] Netherlands                  New Zealand                 
##  [63] Nicaragua                    Norway                      
##  [65] Oman                         Panama                      
##  [67] Paraguay                     Philippines                 
##  [69] Poland                       Portugal                    
##  [71] Puerto Rico                  Qatar                       
##  [73] Republic of Korea            Romania                     
##  [75] Russian Federation           Saint Kitts and Nevis       
##  [77] Saint Lucia                  Saint Vincent and Grenadines
##  [79] San Marino                   Serbia                      
##  [81] Seychelles                   Singapore                   
##  [83] Slovakia                     Slovenia                    
##  [85] South Africa                 Spain                       
##  [87] Sri Lanka                    Suriname                    
##  [89] Sweden                       Switzerland                 
##  [91] Thailand                     Trinidad and Tobago         
##  [93] Turkey                       Turkmenistan                
##  [95] Ukraine                      United Arab Emirates        
##  [97] United Kingdom               United States               
##  [99] Uruguay                      Uzbekistan                  
## 101 Levels: Albania Antigua and Barbuda Argentina Armenia Aruba ... Uzbekistan

There are 100 countries in dataset, but not all countries in the world are listed in this dataset.

For data inspection and cleansing above, we can conclude :

  1. This data before cleansing contains 27,820 of rows and 12 of columns and the data after cleansing contains 27,660 of rows and 10 of columns.
  2. There are 100 countries in dataset.
  3. This dataset taken from 1985 to 2016 (2016 was dropped at data cleansing session above), but in this project will have started from the 1985 to 2015.
  4. The age and generation column are grouped into 6 levels.

Data Exploration

In this section, we will try to find the answers of information objectives that have been defined above.

World

Year

worldyear <- data %>% 
  group_by(year) %>% 
  summarise(sum_suicides = sum(`suicides(/100k.pop)`)) %>% 
  arrange(-sum_suicides) %>% 
  mutate(year = as.factor(year))

worldyear
# visualization
library(ggplot2)
worldyear %>% 
  ggplot(aes(x = sum_suicides, y = reorder(year, sum_suicides), fill = year)) +
  geom_bar(stat = "identity") +
  theme(legend.position = "none")

We can see that the year with the highest number of suicides (14,660.26/100,000 of population) is in 1995.

Descriptive Statistics

summaryyear <- data %>% 
  group_by(year) %>% 
  summarise(sum_suicides = sum(`suicides(/100k.pop)`)) %>% 
  arrange(-sum_suicides)

summary(summaryyear$sum_suicides)
##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
##    6580   10314   11844   11432   13723   14660
  1. The average of annual suicide rate in 100 countries from 1985 to 2015 is 11,432.14/100,000 of population.
  2. The smallest number of suicide in 100 countries from 1985 to 2015 is 6.580/100,000 of population that occured in 1986 (See the dataframe above).
  3. The Highest number of suicide in 100 countries from 1985 to 2015 is 14.660/100,000 of population that occured in 1995 (See the dataframe above).

Sex

worldsex <- data %>% 
  group_by(sex) %>% 
  summarise(sum_suicides = sum(`suicides(/100k.pop)`)) %>% 
  arrange(-sum_suicides)

worldsex
# visualization
y <- worldsex$sum_suicides
z <- worldsex$sex

piepercent <- round(100*y/sum(y), 1)

pie(y, labels = piepercent, main = "City pie chart",col = rainbow(length(y)))
legend("topright", c("Male","Female"), cex = 0.8,
   fill = rainbow(length(y)))

We can see that the sex with the highest number of suicide is Male (279,767.16/100,000 of population or 78.9% of total suicide) and and female take up 21.1% suicide of total suicide (74,629.28/100,000 of population).

Age Group

data %>% 
  group_by(age) %>% 
  summarise(sum_suicides = sum(`suicides(/100k.pop)`)) %>% 
  arrange(-sum_suicides)

We can see that the age group with the highest number of suicides (110,532.19/100,000 of population) is 75+ years.

Generation

data %>% 
  group_by(generation) %>% 
  summarise(sum_suicides = sum(`suicides(/100k.pop)`)) %>% 
  arrange(-sum_suicides)

We can see that the generation with the highest number of suicides (116,548.73 /100,000 of population) is Silent generation.

Country with the highest number of suicides

data %>% 
  group_by(country) %>% 
  summarise(sum_suicide = sum(`suicides(/100k.pop)`)) %>% 
  arrange(-sum_suicide) %>% 
  head(3)

We can see the country that have the highest suicides from 1985 to 2015 is Russian Federation. Now, we will explore the suicide data in Russian Federation.

Year

data %>% 
  filter(country == "Russian Federation") %>% 
  group_by(year) %>% 
  summarise(sum_suicide = sum(`suicides(/100k.pop)`)) %>% 
  arrange(-sum_suicide)

In 1994, the number of suicides reached 567.64/100,000 of population. This is the highest number of suicides in Russia Federation.

Sex

data %>% 
  filter(country == "Russian Federation",
         year == 1994) %>% 
  group_by(sex) %>% 
  summarise(sum_suicide = sum(`suicides(/100k.pop)`)) %>% 
  arrange(-sum_suicide)

Based on data on suicides in 1994 in Russia which were then grouped by gender, the highest number of suicides occurred in male (477.82/100,000 of population). And, the number of cases of suicide in women is 89.82/100,000 of population.

The tabset below will explain further about the distribution of data on suicide in 1994 at Russian Federation which were grouped into 3 gender groups (Male, Female and Both)

Male & Female

data %>% 
   filter(country == "Russian Federation",
         year == 1994) %>% 
  group_by(age) %>% 
  summarise(sum_suicide = sum(`suicides(/100k.pop)`)) %>% 
  arrange(-sum_suicide)

The data above show that the age group with the highest suicides is 75+ years. Continued by finding out the generation with the most suicides in the +75 years age group from both genders

data %>% 
   filter(country == "Russian Federation",
         year == 1994,
         age == "75+ years") %>% 
  group_by(generation) %>% 
  summarise(sum_suicide = sum(`suicides(/100k.pop)`)) %>% 
  arrange(-sum_suicide)

G.I. Generation is the generation with the most suicides in the +75 years age group from both genders (142.31/100,000 of population).

Male

data %>% 
  filter(country == "Russian Federation",
         year == 1994,
         sex == "male") %>% 
  group_by(age) %>% 
  summarise(sum_suicide = sum(`suicides(/100k.pop)`)) %>% 
  arrange(-sum_suicide)

The data above show that the age group with the highest suicides is 35-54 years. Continued by finding out the generation with the most suicides in the 35-54 years age group from male.

data %>% 
  filter(country == "Russian Federation",
         year == 1994,
         sex == "male",
         age == "35-54 years") %>% 
  group_by(generation) %>% 
  summarise(sum_suicide = sum(`suicides(/100k.pop)`)) %>% 
  arrange(-sum_suicide)

Boomers is the generation with the most suicides in the 35-54 years age group from male (142.31/100,000 of population).

Female

data %>% 
  filter(country == "Russian Federation",
         year == 1994,
         sex == "female") %>% 
  group_by(age) %>% 
  summarise(sum_suicide = sum(`suicides(/100k.pop)`)) %>% 
  arrange(-sum_suicide)

The data above show that the age group with the highest suicides is 75+ years. Continued by finding out the generation with the most suicides in the 75+ years age group from female.

data %>% 
  filter(country == "Russian Federation",
         year == 1994,
         sex == "male",
         age == "75+ years") %>% 
  group_by(generation) %>% 
  summarise(sum_suicide = sum(`suicides(/100k.pop)`)) %>% 
  arrange(-sum_suicide)

G.I. Generation is the generation with the most suicides in the +75 years age group from female (142.31/100,000 of population).

Conclusion

Suicide occurs more often in older than in younger people, but is still one of the leading causes of death in late childhood and adolescence worldwide. From this dataset, the most cases of suicide in the world from 1985 to 2015 occurred in 1995. In addition, from that time period, the sex with the most suicides is male. The age group with the most suicides is +75 years. In the last, The silent generation became the generation with the highest suicides from that period of the year.

This dataset proves that Russian Federation was the country with the most suicides from 1985 to 2015. Suicide cases in Russia reached 11,305.13/100,000 of population (1985-2015). From that time period, the year 1994 became the most reported suicide cases in Russian Federation (567.64/100,000 of population), 447.82/100,000 of population of these cases were from men and 89.82/100,000 of population were from women. The +75 years age group accounts for the largest number of cases and is dominated by the G.I. generation.