Data Preparation

Data is loaded from github repository

# load data

url <- "https://raw.githubusercontent.com/javernw/JWCUNYAssignments/master/master.csv"
master_file <- read.csv(url, stringsAsFactors = F, header = T)

Research question

Do suicide rates increase or decrease due to a Country’s standard of living?

Cases

Each case represents a country and the suicide rate within an age group of males or females for the year. There are 27,820 cases.

kable(head(master_file, 20)) %>% kable_styling("striped", "hovered", font_size = 11) %>% scroll_box(height = "500px")
ï..country year sex age suicides_no population suicides.100k.pop country.year HDI.for.year gdp_for_year…. gdp_per_capita…. generation
Albania 1987 male 15-24 years 21 312900 6.71 Albania1987 NA 2,156,624,900 796 Generation X
Albania 1987 male 35-54 years 16 308000 5.19 Albania1987 NA 2,156,624,900 796 Silent
Albania 1987 female 15-24 years 14 289700 4.83 Albania1987 NA 2,156,624,900 796 Generation X
Albania 1987 male 75+ years 1 21800 4.59 Albania1987 NA 2,156,624,900 796 G.I. Generation
Albania 1987 male 25-34 years 9 274300 3.28 Albania1987 NA 2,156,624,900 796 Boomers
Albania 1987 female 75+ years 1 35600 2.81 Albania1987 NA 2,156,624,900 796 G.I. Generation
Albania 1987 female 35-54 years 6 278800 2.15 Albania1987 NA 2,156,624,900 796 Silent
Albania 1987 female 25-34 years 4 257200 1.56 Albania1987 NA 2,156,624,900 796 Boomers
Albania 1987 male 55-74 years 1 137500 0.73 Albania1987 NA 2,156,624,900 796 G.I. Generation
Albania 1987 female 5-14 years 0 311000 0.00 Albania1987 NA 2,156,624,900 796 Generation X
Albania 1987 female 55-74 years 0 144600 0.00 Albania1987 NA 2,156,624,900 796 G.I. Generation
Albania 1987 male 5-14 years 0 338200 0.00 Albania1987 NA 2,156,624,900 796 Generation X
Albania 1988 female 75+ years 2 36400 5.49 Albania1988 NA 2,126,000,000 769 G.I. Generation
Albania 1988 male 15-24 years 17 319200 5.33 Albania1988 NA 2,126,000,000 769 Generation X
Albania 1988 male 75+ years 1 22300 4.48 Albania1988 NA 2,126,000,000 769 G.I. Generation
Albania 1988 male 35-54 years 14 314100 4.46 Albania1988 NA 2,126,000,000 769 Silent
Albania 1988 male 55-74 years 4 140200 2.85 Albania1988 NA 2,126,000,000 769 G.I. Generation
Albania 1988 female 15-24 years 8 295600 2.71 Albania1988 NA 2,126,000,000 769 Generation X
Albania 1988 female 55-74 years 3 147500 2.03 Albania1988 NA 2,126,000,000 769 G.I. Generation
Albania 1988 female 25-34 years 5 262400 1.91 Albania1988 NA 2,126,000,000 769 Boomers

Data collection

Secondary Data: This is a quantitative research where data was collected by extracting information from an online database. The data was compiled from four different databases ( United Nations Development Program (HDI), World Bank, World Health Organization, and Szmali) to identify any attributes that correlated with suicide rates globally.

Type of study

This is an observational study since the participants are observed without any kind of interference.

Data Source

Data was found on Kaggle.com

Dependent Variable

Suicide rate. Quantitative variable

Independent Variable

Gross Domestic Product. Quantitative variable
Generation. Qualitative variable

Relevant summary statistics

options(scipen = 999)

master_file$gdp_for_year.... <- extract_numeric(master_file$gdp_for_year....)
## extract_numeric() is deprecated: please use readr::parse_number() instead
# summary of each variable
summary(master_file)
##   ï..country             year          sex                age           
##  Length:27820       Min.   :1985   Length:27820       Length:27820      
##  Class :character   1st Qu.:1995   Class :character   Class :character  
##  Mode  :character   Median :2002   Mode  :character   Mode  :character  
##                     Mean   :2001                                        
##                     3rd Qu.:2008                                        
##                     Max.   :2016                                        
##                                                                         
##   suicides_no        population       suicides.100k.pop country.year      
##  Min.   :    0.0   Min.   :     278   Min.   :  0.00    Length:27820      
##  1st Qu.:    3.0   1st Qu.:   97498   1st Qu.:  0.92    Class :character  
##  Median :   25.0   Median :  430150   Median :  5.99    Mode  :character  
##  Mean   :  242.6   Mean   : 1844794   Mean   : 12.82                      
##  3rd Qu.:  131.0   3rd Qu.: 1486143   3rd Qu.: 16.62                      
##  Max.   :22338.0   Max.   :43805214   Max.   :224.97                      
##                                                                           
##   HDI.for.year   gdp_for_year....         gdp_per_capita....
##  Min.   :0.483   Min.   :      46919625   Min.   :   251    
##  1st Qu.:0.713   1st Qu.:    8985352832   1st Qu.:  3447    
##  Median :0.779   Median :   48114688201   Median :  9372    
##  Mean   :0.777   Mean   :  445580969026   Mean   : 16866    
##  3rd Qu.:0.855   3rd Qu.:  260202429150   3rd Qu.: 24874    
##  Max.   :0.944   Max.   :18120714000000   Max.   :126352    
##  NA's   :19456                                              
##   generation       
##  Length:27820      
##  Class :character  
##  Mode  :character  
##                    
##                    
##                    
## 


Scatterplot

Relationship between suicide rates and gdp per capita

plot(master_file$suicides.100k.pop, master_file$gdp_per_capita....)

Boxplot

Suicide rate per generation

ggplot(master_file, aes(generation, suicides.100k.pop)) + geom_boxplot()

Histogram

Suicide rates based on gender

ggplot(master_file, aes(x =suicides.100k.pop, fill = sex, color = sex )) + geom_histogram(alpha=0.5, position="identity")
## `stat_bin()` using `bins = 30`. Pick better value with `binwidth`.