This analysis is focused on gaining insight on suicide trends in Poland from 1985-2016. The main dataset for our project was made by United Nations Development Program. The dataset was retrieved from http://databank.worldbank.org/data/source/world-development-indicators# . The dataset has records from 1985 to 2016. The dataset has 27820 observations and 10 features. Features we are interested in include : 1. Country 2. Year : from 1985 to 2015 3. Sex 4. Age : Age groups including “5-14”, “15-24”, “25-34”, “35-54”, “55-74”, and “75+”. 5. Suicides_no : Number of suicides 6. Population 7. generation
To gain proper insight to this analysis, we will review the detailed description of the dataset.
Variable | Description country | The different countries data was collated for. We have a total of 101 countries. year | The years in which the dataset consisted of. There was a total of 32 years from 1987-2016 Sex | The different genders. W have both male and female. Age | The different age groups which are 15-24 years” “35-54 years”, “75+ years”, “25-34 years”, “55-74 years” and “5-14 years” suicides_no | The total deaths by suicide for each case Population | The total population per country Generation | The age groups categorized by generations, which includes “Generation X”, “Silent”, “G.I. Generation”, “Boomers”, “Millenials” and “Generation Z”.
The packages and library to be used will be loaded.
library(tidyverse)
## ── Attaching packages ─────────────────────────────────────── tidyverse 1.3.1 ──
## ✔ ggplot2 3.3.6 ✔ purrr 0.3.4
## ✔ tibble 3.1.7 ✔ dplyr 1.0.9
## ✔ tidyr 1.2.0 ✔ stringr 1.4.0
## ✔ readr 2.0.2 ✔ forcats 0.5.1
## Warning: package 'ggplot2' was built under R version 4.1.2
## Warning: package 'tibble' was built under R version 4.1.2
## Warning: package 'tidyr' was built under R version 4.1.2
## Warning: package 'dplyr' was built under R version 4.1.2
## ── Conflicts ────────────────────────────────────────── tidyverse_conflicts() ──
## ✖ dplyr::filter() masks stats::filter()
## ✖ dplyr::lag() masks stats::lag()
library(janitor)
##
## Attaching package: 'janitor'
## The following objects are masked from 'package:stats':
##
## chisq.test, fisher.test
library(ggthemes)
library(RColorBrewer)
## Warning: package 'RColorBrewer' was built under R version 4.1.2
library(patchwork)
## Warning: package 'patchwork' was built under R version 4.1.2
library(ggplot2)
library(dplyr)
library(tidyr)
Uploading the main dataset
suicide <- read_csv("/Users/macbookpro/Downloads/DS & BA/YEAR 2/semester 1/ADV VIS /adv vis/Adv Vis 1/suicide homework/master.csv")
## Rows: 27820 Columns: 12
## ── Column specification ────────────────────────────────────────────────────────
## Delimiter: ","
## chr (5): country, sex, age, country-year, generation
## dbl (6): year, suicides_no, population, suicides/100k pop, HDI for year, gdp...
##
## ℹ Use `spec()` to retrieve the full column specification for this data.
## ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.
Since this analysis is focused on only Poland data, we will be filtering the data to serve our purpose.
suicide_Poland <- suicide %>% filter(country == 'Poland')
While anaylyzing the overall trend of suicides rate in Poland between 1985 and 2013, we realize data for Poland is available from 1990 to 2015 in the main dataset.
The analysis carried out will give a visual view on the general suicide rate, the ratio between male and female, and compa. In addition, an analysis of age group suicide and the generation suicide trend will be carried out as well.
The plot below analyzes the overall trend of suicides rate in Poland between 1990 and 2015 The data for Poland is available up to 2015 in the main dataset.
Poland_year_wise <- suicide_Poland %>%
group_by(year, country) %>%
summarise(total_suicides = sum(suicides_no))
## `summarise()` has grouped output by 'year'. You can override using the
## `.groups` argument.
options(repr.plot.width=15.8,repr.plot.height=8)
Poland_year_wise %>%
ggplot(aes(x = year, y = total_suicides)) +
geom_point(size = 1, color = 'purple')+
theme_wsj(base_size = 10)+
scale_x_continuous(n.breaks = nrow(Poland_year_wise), limits =
c(1990,2015))+
theme(axis.text.x = element_text(angle=45))+
scale_y_continuous(n.breaks = 10)+
labs( x = 'Year', y = "Number of Suicides",
title = 'Overall Trend of Suicides in Poland',
subtitle = 'From 1990 to 2015')+
theme(plot.title=element_text(size=21),
plot.subtitle=element_text(face="italic",size=13)
)+ geom_smooth(method = 'lm')
## `geom_smooth()` using formula 'y ~ x'
From the output, we can see a peak in the suicide rate in 2010
afterwards, there was a fluctuating rate till 2015 where there was a
major decline in the suicide rates.
In the plot below, we analyse the suicide rate to compare the trend based on genders (Male and Female).
Poland_gender_wise <- suicide_Poland %>%
group_by(year, sex) %>%
summarise(total_suicides = sum(suicides_no))
## `summarise()` has grouped output by 'year'. You can override using the
## `.groups` argument.
Poland_gender_wise %>%
ggplot(aes(x = year, y = total_suicides, group = sex, color =sex)) +
geom_point(size = 1)+
scale_x_continuous(n.breaks = nrow(Poland_year_wise), limits =
c(1990,2015))+
theme(axis.text.x = element_text(angle=45))+
scale_y_continuous(n.breaks = 7)+
theme_wsj(base_size = 10, color = 'blue')+
scale_colour_wsj("colors6", "")+
labs(title = 'Gender-wise Trend of Suicides in Poland',
subtitle = 'From 1990 to 2015')+
theme(plot.title=element_text(size=19),
plot.subtitle=element_text(face="italic",size=13)
)+geom_smooth(method = 'lm')
## `geom_smooth()` using formula 'y ~ x'
The output of this plot is very intriguing. It depicts that over the
years, the rate of men committing suicides have been higher than woman
constantly with a very high difference.
In the plot below, we analyse the suicide rate compared to the total population per 100k in Poland on a year by year basis.
suicide_Poland %>%
group_by(year) %>%
summarize(population = sum(population),
suicides = sum(suicides_no),
suicides_per_100k = (suicides / population) * 100000) %>%
ggplot(aes(x = year, y = suicides_per_100k)) +
geom_point(col = "deepskyblue3", size = 2) +
labs(title = "Poland Suicides (per 100k)",
subtitle = "Trend over time, 1990 - 2015.",
x = "Year",
y = "Suicides per 100k") +
scale_x_continuous(breaks = seq(1985, 2015, 2)) +
scale_y_continuous(breaks = seq(10, 20))+geom_smooth(method = 'lm')
## `geom_smooth()` using formula 'y ~ x'
The peak suicide rate was 17.9 deaths per 100k in 2009. It decreased
steadily, to 17.2 per 100k in 2010 (~5% decrease). In 2015, there was a
major decline in the death rate per 100k.
This analysis shows a distribution of suicides rate among different age groups between 1990 and 2015 in Poland
suicide_Poland %>%
ggplot(aes( y = fct_reorder(age, suicides_no), x = suicides_no))+
geom_boxplot(fill = 'red')+
theme_bw(base_size = 12)+
labs(title = 'Distribution of Suicides Rate among Different Age Groups',
subtitle = 'From 1990 to 2015', y = 'Age Groups',
x = 'Number of Suicides'+geom_smooth(method = 'lm')+ theme(plot.title=element_text(size=19),
plot.subtitle=element_text(face="italic",size=13)))
Based on the output, we can see that the age group with the highest
suicide rate is 35-54 years while the lowest is 5-14 years. Based on my
expectation, this is very accurate. As this is the age where most adults
have the burden of carrying a lot of responsibilities from work and
family too.
The generation wise trend, which is similar to the age trend (as generation is by age) shows which generation had a certain rate of suicide.
Generation<-suicide_Poland%>%
group_by(`generation`)%>%
summarize("Total suicides"=sum(suicides_no))%>%arrange(-`Total suicides`)
ggplot(data=Generation,aes
(x=reorder(`generation`,-`Total suicides`),
y=`Total suicides`,fill=`generation`))+
geom_col()+
theme_wsj()+
scale_fill_brewer(palette="Pastel1")+
labs(title="Suicide rates of Poland by generation",
subtitle="From 1990 to 2015")+
geom_label(aes(`generation`,`Total suicides`,label=`Total suicides`))+
theme(legend.position="none",
plot.title=element_text(size=19),
plot.subtitle=element_text(face="italic",size=13),
)
The output shows that over the years, baby boomers have had a higher rate of suicide than the other generations. But on the other hand, this plot might not be so accurate because based on the dataset, there were not so many generation z born yet as this generation started in 1997.
You can also embed plots, for example:
Note that the echo = FALSE
parameter was added to the
code chunk to prevent printing of the R code that generated the
plot.