The main dataset for our project is a combined from several datasets made by United Nations Development Program, World Bank, Kaggle, and World Health Organization. The dataset can be found here.The dataset has records from 1985 to 2016. However, since there is very few data in 2016, we will only keep the range from 1985 to 2015.The dataset has 27820 observations and 10 features. Features we are interested in include :
Features
Country
Year : from 1985 to 2015
Sex
Age : Age groups including “5-14”, “15-24”, “25-34”, “35-54”, “55-74”, and “75+”.
Suicides_no : Number of suicides
Population
GDP_per_capita:ratio between the country’s GDP and its population
Continent
Apart from those, we will create another variable called suicides_per_100k which is obtained by dividing Suicides_no by population
There are on average, 74.4 countries in the dataset across each year. Graph below shows the distribution of countries from 1995 to 2015, Although the number of countries before 1995 is quite less,after that the amount of countries for each year is stable around 80.
We plot the estimated total suicides per 100k population across all countries throughout the time. As we can see,the number of suicides reaches its peak in the year 1995 with 243544 cases.But in recent years, the estimated suicides are decreasing.
Globally, the rate of suicide has been ~3.5 times higher for men.And this trend has remained constant since mid 90s.The male and female suicide rates peaked in the year 1995.
The ratio between male and female have remained constant in all Continents, Age groups and sex.It is also found that the likelihood of suicide rates increases with age. This trend is also true across all continents and sex.
In the Continent of Oceania there is higher number of suicides in people aged between 15-24 and 25-34.wheras in other Continents the suicide rates are more between age groups of 55-74 and 75+.
As can be seen from the plots below,the rate of suicides is decreasing in Asia and Europe but decreasing in America and Oceania.Since 1995, suicide rate is relatively constant in the African continent.
After the year 1995 which saw the highest number of cases,the suicide rates have been steadily decreasing across all age groups with an exception of age group between 5-14 where the rate is nearly constant.
Lithuania’s number has been the highest with around 41 suicides per 100k population,followed by Russian Federation and Sri Lanka.There is a large over representation of European countries with high suicide rates.
Here we are interested in finding the change in suicide rates in the year 1995 and 2014.The reason for selecting these two years is to have large representation of countries across continents.We also ignore few countries that have almost no change in the suicide rates between the year 1995 and 2014.
It is observed that there is a huge decrease in the suicide rates in most of the European countries compared to countries of other continents.Portugal, United Kingdom and Malta are the few countries with increase in Suicide rates in Europe.Estonia is the country with considerable decrease by almost 125%,followed by Russian Federation and Latvia with 121% and 113% respectively.
Republic of Korea,Japan,Georgia and Qatar are the countries with an increase in suicide rates.Republic of Korea particularly catches the eye with highest increase among all the countries,as suicide rate has increased by 147%.
Americas is a peculiar case with more than two thirds of the countries have an increase in the suicide rates,which is not the trend followed by countries of other continents.Suriname has the highest increase with 82.2% and Cuba has the steepest decrease of 64.4%.
As observed from the graph below, There is weak positive linear relationship between suicide rates and Country’s GDP.That is richer countries are associated to higher rates of suicide.
The points to do top left and bottom right are exceptional cases(outliers), but heavily influence the regression line, hence will be removed.
##
## Call:
## lm(formula = suicide_per_100k ~ gdp_per_capita, data = gdp_suicide_no_outliers)
##
## Residuals:
## Min 1Q Median 3Q Max
## -12.404 -4.977 -2.112 5.577 19.782
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 9.155e+00 3.472e-03 2637.3 <2e-16 ***
## gdp_per_capita 1.175e-04 1.459e-07 804.8 <2e-16 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 6.765 on 8450926 degrees of freedom
## Multiple R-squared: 0.07119, Adjusted R-squared: 0.07119
## F-statistic: 6.477e+05 on 1 and 8450926 DF, p-value: < 2.2e-16
Now we can see the clear positive linear relation between GDP and Suicide rates.