I have chosen data from Kaggle, it compares socio-economic information with suicide rates by year and country. My intentions are to find signals correlated to increased suicide rates among different cohorts globally, across the socio-economic spectrum.
The World Bank strives to enhance public access to and use of data that it collects and publishes. The data are organized in datasets listed in The World Bank Data Catalog. The Datasets are collections of data, managed by The World Bank and provided in a number of machine-readable formats. This dataset contains variables County, year, age, sex, Suicide no., population, HDI for year, GDP for year/capita, and generation for year 1985-2016 an d 27820 observations. GDP per capita is a measure of a country’s economic output that accounts for its number of people. It divides the country’s gross domestic product by its total population. That makes it the best measurement of a country’s standard of living. It tells you how prosperous a country feels to each of its citizens. HDI: Published on 4 November 2010 (and updated on 10 June 2011), the 2010 Human Development Report calculated the HDI combining three dimensions: A long and healthy life: Life expectancy at birth. Education index: Mean years of schooling and Expected years of schooling. A decent standard of living: GNI per capita (PPP US$). I will be using Regression Analysis-LM for my research questions.
Statistical update is being released to ensure consistency in reporting on key human development indices and statistics. It includes an analysis of the state of human development—snapshots of current conditions as well as long-term trends in human development indicators.
The overall trend globally is toward continued human development improvements, with many countries moving up through the human development categories: out of the 189 countries for which the HDI is calculated, 59 countries are today in the very high human development group and only 38 countries fall in the low HDI group. Just eight years ago in 2010, the figures were 46 and 49 countries respectively.
What is the percentage of men and women suicide globally?
In which year suicide number is higher among men and women?
What is the distribution of suicides among the different age groups?
What is the suicide rates among the different generations?
Looking at Suicide Vs Population!
Looking at Suicides trends overs years!
Looking at median suicide rate among age group and generation! Inspecting suicides globally, whether third-world countries tend to have more suicides, corresponding to a lower standards of living, or if first-world countries also have significant numbers of suicides?
There is a high number of suicide rate among men than women i.e 76.9% for men and only 23.1% for women. In years 1995-2000 there is a high suicide percentage among men.
Creating a new variable, suicideRate, that contains the percentage of suicides for the countries, this might help a little bit in dealing with countries with extreme populations.
I’ll be using ggplot to generate a few visualizations in order to get a better understanding of the underlying patterns in the data set. Answering my research question, suicides among the different age groups.
Very notable spike right at the beginning of the histogram for the youngest age group, this corresponds to the fact that lots of countries have low suicide rates among the youth.
Notably, the 75+ age group appears to have a slightly more significant tail as compared to the other age groups, indicating the fact that some countries have high suicide rates when it comes to the older age groups.
I also use an additional aesthetic in the plot to look at the distribution of age groups within generations, mainly because I’ve always been curious as to which age group belonged to generation X, which age group belonged to generation Y, etc.
This graph largely resembles the previous graph, with major spikes occurring among the younger generations.
My dataset contains a ‘year’ variable. I’d love to explore the time-related aspect of the data, I will inspect here the relationship between year and suicideRate.
The rates show a negative trend over time, especially in the case of the 75+ years age group. This is encouraging news, however, keep in mind that this does not necessarily mean that the suicides have dropped, only that the rates have dropped. The population boom could easily be to blame for this instead of better social and cultural factors in society.
When it comes to inspecting suicides globally, a key question that comes to mind is whether third-world countries tend to have more suicides, corresponding to a lower standards of living, or if first-world countries also have significant numbers of suicides.
I am looking at the average suicides for each country to see the relationship with low standards of living.
The United States and Japan have the highest average number of suicides, in stark contrast to their high standards of living.
Previously, I looked at the distribution of suicide rates across different age groups/generations. Now, I am looking at the median number of suicides per age group and generation.
Amidst all the news of millenial suicides, I find that it is actually the 35-54 age group that has the highest average suicides in the data.
According to the below graph, suicide rates should be increasing on average as the population increases, however, the variability increases drastically as well, shedding light on the fact that suicide rates do not depend solely on population. i,e not significant
With red for more suicides, and blue for fewer suicides.
Sizes of the points in accordance to the median suicides.
Overall suicide median is 25
A world map, coloured according to the average number of suicides per country. How dark the US, Japan, and Russia are, this directly corresponds to an earlier graph wherein I plotted the same data using a bar plot.The dataset has no information when it comes to countries like China and India, surprisingly. i.e these countries are not available in the dataset.
All countries are not included in the dataset, i.e there are missing countries and cannot analyze them.
1-Suicide are higher among men than women
2-Higher standards of living do not necessarily indicate lower suicide rates.
3-Suicide numbers were decreasing up till 2010, after which they started rising again.
4-Middle-aged people are more likely to commit suicides globally.
5-The year, sex, and population are strong indicators when used to predict the number of suicides for a country.
6-There is a weak positive relationship between a countries GDP (per capita) and suicide rate
-https://www.kaggle.com/russellyates88/suicide-rates-overview-1985-to-2016
-United Nations Development Program. (2018). Human development index (HDI). Retrieved from http://hdr.undp.org/en/indicators/137506
-World Bank. (2018). World development indicators: GDP (current US$) by country:1985 to 2016. Retrieved from -http://databank.worldbank.org/data/source/world-development-indicators#
-World Health Organization. (2018). Suicide prevention. Retrieved from http://www.who.int/mental_health/suicide-prevention/en/