This compiled dataset pulled from four other datasets linked by time and place, and was built to find signals correlated to increased suicide rates among different cohorts globally, across the socio-economic spectrum.
library(knitr)
kable(SuicideRates[1:5,], caption = "dataset parameters")
| п.їcountry | year | sex | age | suicides_no | population | suicides.100k.pop | country.year | HDI.for.year | gdp_for_year…. | gdp_per_capita…. | generation |
|---|---|---|---|---|---|---|---|---|---|---|---|
| Albania | 1987 | male | 15-24 years | 21 | 312900 | 6.71 | Albania1987 | NA | 2,156,624,900 | 796 | Generation X |
| Albania | 1987 | male | 35-54 years | 16 | 308000 | 5.19 | Albania1987 | NA | 2,156,624,900 | 796 | Silent |
| Albania | 1987 | female | 15-24 years | 14 | 289700 | 4.83 | Albania1987 | NA | 2,156,624,900 | 796 | Generation X |
| Albania | 1987 | male | 75+ years | 1 | 21800 | 4.59 | Albania1987 | NA | 2,156,624,900 | 796 | G.I. Generation |
| Albania | 1987 | male | 25-34 years | 9 | 274300 | 3.28 | Albania1987 | NA | 2,156,624,900 | 796 | Boomers |
SuicideRates.dim <- dim(SuicideRates)
Dataset contains 27820 rows and 12 columns
str(SuicideRates)
## 'data.frame': 27820 obs. of 12 variables:
## $ п.їcountry : chr "Albania" "Albania" "Albania" "Albania" ...
## $ year : int 1987 1987 1987 1987 1987 1987 1987 1987 1987 1987 ...
## $ sex : chr "male" "male" "female" "male" ...
## $ age : chr "15-24 years" "35-54 years" "15-24 years" "75+ years" ...
## $ suicides_no : int 21 16 14 1 9 1 6 4 1 0 ...
## $ population : int 312900 308000 289700 21800 274300 35600 278800 257200 137500 311000 ...
## $ suicides.100k.pop : num 6.71 5.19 4.83 4.59 3.28 2.81 2.15 1.56 0.73 0 ...
## $ country.year : chr "Albania1987" "Albania1987" "Albania1987" "Albania1987" ...
## $ HDI.for.year : num NA NA NA NA NA NA NA NA NA NA ...
## $ gdp_for_year.... : chr "2,156,624,900" "2,156,624,900" "2,156,624,900" "2,156,624,900" ...
## $ gdp_per_capita....: int 796 796 796 796 796 796 796 796 796 796 ...
## $ generation : chr "Generation X" "Silent" "Generation X" "G.I. Generation" ...
summary(SuicideRates)
## п.їcountry year sex age
## Length:27820 Min. :1985 Length:27820 Length:27820
## Class :character 1st Qu.:1995 Class :character Class :character
## Mode :character Median :2002 Mode :character Mode :character
## Mean :2001
## 3rd Qu.:2008
## Max. :2016
##
## suicides_no population suicides.100k.pop country.year
## Min. : 0.0 Min. : 278 Min. : 0.00 Length:27820
## 1st Qu.: 3.0 1st Qu.: 97498 1st Qu.: 0.92 Class :character
## Median : 25.0 Median : 430150 Median : 5.99 Mode :character
## Mean : 242.6 Mean : 1844794 Mean : 12.82
## 3rd Qu.: 131.0 3rd Qu.: 1486143 3rd Qu.: 16.62
## Max. :22338.0 Max. :43805214 Max. :224.97
##
## HDI.for.year gdp_for_year.... gdp_per_capita.... generation
## Min. :0.483 Length:27820 Min. : 251 Length:27820
## 1st Qu.:0.713 Class :character 1st Qu.: 3447 Class :character
## Median :0.779 Mode :character Median : 9372 Mode :character
## Mean :0.777 Mean : 16866
## 3rd Qu.:0.855 3rd Qu.: 24874
## Max. :0.944 Max. :126352
## NA's :19456
library(ggplot2)
ggplot(SuicideRates, aes(x=suicides.100k.pop)) + geom_histogram(binwidth = 1)
library(ggplot2)
ggplot(SuicideRates, aes(x=year, y=suicides_no)) + geom_point()
ggplot(SuicideRates, aes(x=generation, y=suicides_no)) + geom_point()
ggplot(SuicideRates, aes(x=suicides.100k.pop, y=п.їcountry)) + geom_point()
pairs(~ year + suicides_no + population + suicides.100k.pop + HDI.for.year + gdp_per_capita...., data = SuicideRates, main = 'Suicide Rates Data')
## corrplot 0.84 loaded
## year suicides_no population suicides.100k.pop
## year 1.000000000 -0.004545958 0.008850170 -0.039036797
## suicides_no -0.004545958 1.000000000 0.616162268 0.306604451
## population 0.008850170 0.616162268 1.000000000 0.008284973
## suicides.100k.pop -0.039036797 0.306604451 0.008284973 1.000000000
## gdp_per_capita.... 0.339134280 0.061329749 0.081509858 0.001785134
## gdp_per_capita....
## year 0.339134280
## suicides_no 0.061329749
## population 0.081509858
## suicides.100k.pop 0.001785134
## gdp_per_capita.... 1.000000000
## Loading required package: lattice
## year suicides_no population suicides.100k.pop
## Min. :0.0000 Min. :0.0000000 Min. :0.000000 Min. :0.000000
## 1st Qu.:0.3226 1st Qu.:0.0001343 1st Qu.:0.002219 1st Qu.:0.004089
## Median :0.5484 Median :0.0011192 Median :0.009813 Median :0.026626
## Mean :0.5245 Mean :0.0108593 Mean :0.042107 Mean :0.056968
## 3rd Qu.:0.7419 3rd Qu.:0.0058644 3rd Qu.:0.033920 3rd Qu.:0.073876
## Max. :1.0000 Max. :1.0000000 Max. :1.000000 Max. :1.000000
## gdp_per_capita....
## Min. :0.00000
## 1st Qu.:0.02534
## Median :0.07233
## Mean :0.13176
## 3rd Qu.:0.19526
## Max. :1.00000
The results of the exploratory data analysis are the main statistical characteristics of the dataset features, the determination of the need to remove noise from data, the elimination of highly correlated parameters.