Part 1 - Introduction

Suicide often derives from deep feelings of hopelessness. Victims usually have the inability to see solutions to problems or to cope with challenging life circumstances which then leads them to see suicide as the only solution. According to World Health Organization suicide is a major health problem worldwide and is a leading cause of death. Over 800,000 people die every year from suicide which is estimated to be at a rate of one person every 40 seconds. However, suicide is preventable when timely, effective interventions are implemented at national, municipal and individual levels. When it comes to a country’s income level, suicide does not just occur in high-income countries, but it is a global phenomenon in all regions of the world. In fact, over 79% of global suicides occurred in low- and middle-income countries in 2016. This study will be focused on if there is a change in a country’s suicide rate based on the standard of living or it’s GDP.

Part 2 - Data

The data to be used in this project was collected by extracting information from an online database (Kaggle.com). It was compiled from four different datasets ( United Nations Development Program (HDI), World Bank, World Health Organization, and Szmali) to identify any attributes that correlated with suicide rates globally.

There are a total of 27,820 cases where each case represents a country and the suicide rate within an age group of males or females for the year between 1987 and 2016 along with the country’s GDP at the time. As mentioned earlier, the research will be concerned with discovering any relationship between suicide rate and GDP, therefore these are the two variables will be the main focus. Both variables are both quantitative.

This study is an observational study since the participants are observed without any kind of interference.Therefore, the goal is to see if there is any relationship betwee GDP and Suicide. The population of interest is all persons ages 5 and up who committed suicide. The demographics from this data comes from countries around the world so we can generalize our conclusions to the population globally. However, since the study is observational the findings cannot be used to establish causal relationships, only correlation. For the sake of potential bias, we have to assume that every country equally reported all suicide events otherwise our conclusions may be incorrect. Suicide is seen as a negative attraction for a country’s economy so if numbers go unreported, then the results will reflect inaccuracy.

Limitations and Assumptions
  • The data in this research is based on reports up to 2016.
  • Outliers are identified but remained to be included on the report as these are the actual reported statistics from the source.
  • The significance level is set to 0.05 and is assumed that there is a homogeneity of variance in the data.

Part 3 - Exploratory data analysis

Clean Data

url <- "https://raw.githubusercontent.com/javernw/JWCUNYAssignments/master/master.csv"
master_file <- read.csv(url, stringsAsFactors = F, header = T)

dim(master_file)
## [1] 27820    12

Removing HDI because this column is almost empty and country.year is repetitve

# remove useless columns
suicide <- master_file %>% dplyr::select(-HDI.for.year, -country.year)
#rename columns
names(suicide) <- str_to_title(names(suicide), locale = "en") %>% gsub("Ï..Country","Country", .) %>% gsub("Suicides.100k.pop", "Suicide_per_100k", .) %>% gsub("\\.\\.\\.\\.", "", .)

#add continent
suicide$Continent <- countrycode(sourcevar = suicide$Country,
                            origin = "country.name",
                            destination = "continent")
#rearrange columns
suicide <- suicide[, c("Continent", "Country", "Year", "Sex" ,"Age", "Suicides_no", "Population", "Suicide_per_100k", "Gdp_for_year", "Gdp_per_capita", "Generation")]

# only keep rows where data is not missing
suicide <- suicide[complete.cases(suicide), ]

suicide$Gdp_for_year <- parse_number(suicide$Gdp_for_year)


A view of suicide dataframe after some adjustments. Showing the first 200.

kable(head(suicide, 200)) %>% kable_styling(bootstrap_options = "striped" ,font_size = 11) %>% scroll_box(height = "500px")
Continent Country Year Sex Age Suicides_no Population Suicide_per_100k Gdp_for_year Gdp_per_capita Generation
Europe Albania 1987 male 15-24 years 21 312900 6.71 2156624900 796 Generation X
Europe Albania 1987 male 35-54 years 16 308000 5.19 2156624900 796 Silent
Europe Albania 1987 female 15-24 years 14 289700 4.83 2156624900 796 Generation X
Europe Albania 1987 male 75+ years 1 21800 4.59 2156624900 796 G.I. Generation
Europe Albania 1987 male 25-34 years 9 274300 3.28 2156624900 796 Boomers
Europe Albania 1987 female 75+ years 1 35600 2.81 2156624900 796 G.I. Generation
Europe Albania 1987 female 35-54 years 6 278800 2.15 2156624900 796 Silent
Europe Albania 1987 female 25-34 years 4 257200 1.56 2156624900 796 Boomers
Europe Albania 1987 male 55-74 years 1 137500 0.73 2156624900 796 G.I. Generation
Europe Albania 1987 female 5-14 years 0 311000 0.00 2156624900 796 Generation X
Europe Albania 1987 female 55-74 years 0 144600 0.00 2156624900 796 G.I. Generation
Europe Albania 1987 male 5-14 years 0 338200 0.00 2156624900 796 Generation X
Europe Albania 1988 female 75+ years 2 36400 5.49 2126000000 769 G.I. Generation
Europe Albania 1988 male 15-24 years 17 319200 5.33 2126000000 769 Generation X
Europe Albania 1988 male 75+ years 1 22300 4.48 2126000000 769 G.I. Generation
Europe Albania 1988 male 35-54 years 14 314100 4.46 2126000000 769 Silent
Europe Albania 1988 male 55-74 years 4 140200 2.85 2126000000 769 G.I. Generation
Europe Albania 1988 female 15-24 years 8 295600 2.71 2126000000 769 Generation X
Europe Albania 1988 female 55-74 years 3 147500 2.03 2126000000 769 G.I. Generation
Europe Albania 1988 female 25-34 years 5 262400 1.91 2126000000 769 Boomers
Europe Albania 1988 male 25-34 years 5 279900 1.79 2126000000 769 Boomers
Europe Albania 1988 female 35-54 years 4 284500 1.41 2126000000 769 Silent
Europe Albania 1988 female 5-14 years 0 317200 0.00 2126000000 769 Generation X
Europe Albania 1988 male 5-14 years 0 345000 0.00 2126000000 769 Generation X
Europe Albania 1989 male 75+ years 2 22500 8.89 2335124988 833 G.I. Generation
Europe Albania 1989 male 25-34 years 18 283600 6.35 2335124988 833 Boomers
Europe Albania 1989 male 35-54 years 15 318400 4.71 2335124988 833 Silent
Europe Albania 1989 male 55-74 years 6 142100 4.22 2335124988 833 G.I. Generation
Europe Albania 1989 male 15-24 years 12 323500 3.71 2335124988 833 Generation X
Europe Albania 1989 female 35-54 years 7 288600 2.43 2335124988 833 Silent
Europe Albania 1989 female 15-24 years 5 299900 1.67 2335124988 833 Generation X
Europe Albania 1989 female 25-34 years 2 266300 0.75 2335124988 833 Boomers
Europe Albania 1989 female 55-74 years 1 149600 0.67 2335124988 833 G.I. Generation
Europe Albania 1989 female 5-14 years 0 321900 0.00 2335124988 833 Generation X
Europe Albania 1989 female 75+ years 0 37000 0.00 2335124988 833 G.I. Generation
Europe Albania 1989 male 5-14 years 0 349700 0.00 2335124988 833 Generation X
Europe Albania 1992 male 35-54 years 12 343800 3.49 709452584 251 Boomers
Europe Albania 1992 male 15-24 years 9 263700 3.41 709452584 251 Generation X
Europe Albania 1992 male 55-74 years 5 159500 3.13 709452584 251 Silent
Europe Albania 1992 male 25-34 years 7 245500 2.85 709452584 251 Boomers
Europe Albania 1992 female 15-24 years 7 292400 2.39 709452584 251 Generation X
Europe Albania 1992 female 25-34 years 4 267400 1.50 709452584 251 Boomers
Europe Albania 1992 female 35-54 years 2 323100 0.62 709452584 251 Boomers
Europe Albania 1992 female 55-74 years 1 164900 0.61 709452584 251 Silent
Europe Albania 1992 female 5-14 years 0 336700 0.00 709452584 251 Millenials
Europe Albania 1992 female 75+ years 0 38700 0.00 709452584 251 G.I. Generation
Europe Albania 1992 male 5-14 years 0 362900 0.00 709452584 251 Millenials
Europe Albania 1992 male 75+ years 0 23900 0.00 709452584 251 G.I. Generation
Europe Albania 1993 male 15-24 years 18 243300 7.40 1228071038 437 Generation X
Europe Albania 1993 male 55-74 years 7 165000 4.24 1228071038 437 Silent
Europe Albania 1993 male 75+ years 1 24200 4.13 1228071038 437 G.I. Generation
Europe Albania 1993 male 25-34 years 9 230100 3.91 1228071038 437 Boomers
Europe Albania 1993 female 15-24 years 10 285300 3.51 1228071038 437 Generation X
Europe Albania 1993 male 35-54 years 10 350300 2.85 1228071038 437 Boomers
Europe Albania 1993 female 25-34 years 7 261800 2.67 1228071038 437 Boomers
Europe Albania 1993 female 35-54 years 7 331200 2.11 1228071038 437 Boomers
Europe Albania 1993 female 55-74 years 2 169500 1.18 1228071038 437 Silent
Europe Albania 1993 female 5-14 years 1 340300 0.29 1228071038 437 Millenials
Europe Albania 1993 male 5-14 years 1 367000 0.27 1228071038 437 Millenials
Europe Albania 1993 female 75+ years 0 39300 0.00 1228071038 437 G.I. Generation
Europe Albania 1994 male 75+ years 2 24600 8.13 1985673798 697 G.I. Generation
Europe Albania 1994 male 55-74 years 11 171400 6.42 1985673798 697 Silent
Europe Albania 1994 female 75+ years 2 39900 5.01 1985673798 697 G.I. Generation
Europe Albania 1994 male 25-34 years 6 231400 2.59 1985673798 697 Boomers
Europe Albania 1994 male 35-54 years 9 362800 2.48 1985673798 697 Boomers
Europe Albania 1994 male 15-24 years 6 242200 2.48 1985673798 697 Generation X
Europe Albania 1994 female 15-24 years 6 282600 2.12 1985673798 697 Generation X
Europe Albania 1994 female 25-34 years 4 261100 1.53 1985673798 697 Boomers
Europe Albania 1994 female 35-54 years 2 342500 0.58 1985673798 697 Boomers
Europe Albania 1994 female 55-74 years 1 174600 0.57 1985673798 697 Silent
Europe Albania 1994 male 5-14 years 1 371800 0.27 1985673798 697 Millenials
Europe Albania 1994 female 5-14 years 0 344400 0.00 1985673798 697 Millenials
Europe Albania 1995 male 25-34 years 13 232900 5.58 2424499009 835 Generation X
Europe Albania 1995 male 55-74 years 9 178000 5.06 2424499009 835 Silent
Europe Albania 1995 female 75+ years 2 40800 4.90 2424499009 835 G.I. Generation
Europe Albania 1995 female 15-24 years 13 283500 4.59 2424499009 835 Generation X
Europe Albania 1995 male 15-24 years 11 241200 4.56 2424499009 835 Generation X
Europe Albania 1995 male 75+ years 1 25100 3.98 2424499009 835 G.I. Generation
Europe Albania 1995 male 35-54 years 14 375900 3.72 2424499009 835 Boomers
Europe Albania 1995 female 25-34 years 7 264000 2.65 2424499009 835 Generation X
Europe Albania 1995 female 35-54 years 8 356400 2.24 2424499009 835 Boomers
Europe Albania 1995 male 5-14 years 6 376500 1.59 2424499009 835 Millenials
Europe Albania 1995 female 55-74 years 2 180400 1.11 2424499009 835 Silent
Europe Albania 1995 female 5-14 years 2 348700 0.57 2424499009 835 Millenials
Europe Albania 1996 male 75+ years 2 25400 7.87 3314898292 1127 G.I. Generation
Europe Albania 1996 male 15-24 years 17 243600 6.98 3314898292 1127 Generation X
Europe Albania 1996 male 25-34 years 14 235300 5.95 3314898292 1127 Generation X
Europe Albania 1996 female 15-24 years 16 287700 5.56 3314898292 1127 Generation X
Europe Albania 1996 female 75+ years 2 41200 4.85 3314898292 1127 G.I. Generation
Europe Albania 1996 female 25-34 years 10 267900 3.73 3314898292 1127 Generation X
Europe Albania 1996 male 35-54 years 12 379600 3.16 3314898292 1127 Boomers
Europe Albania 1996 female 35-54 years 9 362000 2.49 3314898292 1127 Boomers
Europe Albania 1996 male 55-74 years 3 179900 1.67 3314898292 1127 Silent
Europe Albania 1996 female 55-74 years 1 183100 0.55 3314898292 1127 Silent
Europe Albania 1996 male 5-14 years 2 380400 0.53 3314898292 1127 Millenials
Europe Albania 1996 female 5-14 years 1 354100 0.28 3314898292 1127 Millenials
Europe Albania 1997 male 25-34 years 36 236000 15.25 2359903108 793 Generation X
Europe Albania 1997 male 15-24 years 33 244400 13.50 2359903108 793 Generation X
Europe Albania 1997 male 75+ years 3 25400 11.81 2359903108 793 G.I. Generation
Europe Albania 1997 male 35-54 years 30 380800 7.88 2359903108 793 Boomers
Europe Albania 1997 female 15-24 years 21 294000 7.14 2359903108 793 Generation X
Europe Albania 1997 male 55-74 years 12 180300 6.66 2359903108 793 Silent
Europe Albania 1997 female 25-34 years 16 273900 5.84 2359903108 793 Generation X
Europe Albania 1997 female 75+ years 2 42100 4.75 2359903108 793 G.I. Generation
Europe Albania 1997 female 35-54 years 7 370100 1.89 2359903108 793 Boomers
Europe Albania 1997 female 5-14 years 6 361800 1.66 2359903108 793 Millenials
Europe Albania 1997 male 5-14 years 4 381500 1.05 2359903108 793 Millenials
Europe Albania 1997 female 55-74 years 0 187000 0.00 2359903108 793 Silent
Europe Albania 1998 male 75+ years 3 25800 11.63 2707123772 899 G.I. Generation
Europe Albania 1998 male 15-24 years 27 248800 10.85 2707123772 899 Generation X
Europe Albania 1998 female 15-24 years 32 295600 10.83 2707123772 899 Generation X
Europe Albania 1998 male 25-34 years 26 240400 10.82 2707123772 899 Generation X
Europe Albania 1998 male 35-54 years 29 388200 7.47 2707123772 899 Boomers
Europe Albania 1998 male 55-74 years 9 183800 4.90 2707123772 899 Silent
Europe Albania 1998 female 25-34 years 10 275300 3.63 2707123772 899 Generation X
Europe Albania 1998 female 55-74 years 6 188200 3.19 2707123772 899 Silent
Europe Albania 1998 female 35-54 years 9 372100 2.42 2707123772 899 Boomers
Europe Albania 1998 male 5-14 years 2 388400 0.51 2707123772 899 Millenials
Europe Albania 1998 female 5-14 years 1 363800 0.27 2707123772 899 Millenials
Europe Albania 1998 female 75+ years 0 42300 0.00 2707123772 899 G.I. Generation
Europe Albania 1999 male 75+ years 3 25900 11.58 3414760915 1127 G.I. Generation
Europe Albania 1999 male 15-24 years 24 250600 9.58 3414760915 1127 Generation X
Europe Albania 1999 female 75+ years 4 42400 9.43 3414760915 1127 G.I. Generation
Europe Albania 1999 male 35-54 years 31 391100 7.93 3414760915 1127 Boomers
Europe Albania 1999 male 25-34 years 19 242300 7.84 3414760915 1127 Generation X
Europe Albania 1999 male 55-74 years 14 185200 7.56 3414760915 1127 Silent
Europe Albania 1999 female 15-24 years 19 296800 6.40 3414760915 1127 Generation X
Europe Albania 1999 female 25-34 years 13 276500 4.70 3414760915 1127 Generation X
Europe Albania 1999 female 55-74 years 6 188800 3.18 3414760915 1127 Silent
Europe Albania 1999 female 35-54 years 5 373600 1.34 3414760915 1127 Boomers
Europe Albania 1999 female 5-14 years 1 365200 0.27 3414760915 1127 Millenials
Europe Albania 1999 male 5-14 years 0 391300 0.00 3414760915 1127 Millenials
Europe Albania 2000 male 25-34 years 17 232000 7.33 3632043908 1299 Generation X
Europe Albania 2000 male 55-74 years 10 177400 5.64 3632043908 1299 Silent
Europe Albania 2000 female 75+ years 2 37800 5.29 3632043908 1299 G.I. Generation
Europe Albania 2000 male 75+ years 1 24900 4.02 3632043908 1299 G.I. Generation
Europe Albania 2000 female 15-24 years 6 263900 2.27 3632043908 1299 Generation X
Europe Albania 2000 male 15-24 years 5 240000 2.08 3632043908 1299 Generation X
Europe Albania 2000 female 35-54 years 5 332200 1.51 3632043908 1299 Boomers
Europe Albania 2000 female 25-34 years 3 245800 1.22 3632043908 1299 Generation X
Europe Albania 2000 male 35-54 years 4 374700 1.07 3632043908 1299 Boomers
Europe Albania 2000 male 5-14 years 1 374900 0.27 3632043908 1299 Millenials
Europe Albania 2000 female 5-14 years 0 324700 0.00 3632043908 1299 Millenials
Europe Albania 2000 female 55-74 years 0 168000 0.00 3632043908 1299 Silent
Europe Albania 2001 male 25-34 years 22 206484 10.65 4060758804 1451 Generation X
Europe Albania 2001 male 35-54 years 34 378826 8.98 4060758804 1451 Boomers
Europe Albania 2001 male 55-74 years 11 196670 5.59 4060758804 1451 Silent
Europe Albania 2001 female 75+ years 2 47254 4.23 4060758804 1451 Silent
Europe Albania 2001 male 15-24 years 10 256039 3.91 4060758804 1451 Millenials
Europe Albania 2001 female 15-24 years 9 271359 3.32 4060758804 1451 Millenials
Europe Albania 2001 female 35-54 years 12 370191 3.24 4060758804 1451 Boomers
Europe Albania 2001 male 75+ years 1 31044 3.22 4060758804 1451 Silent
Europe Albania 2001 female 55-74 years 6 189799 3.16 4060758804 1451 Silent
Europe Albania 2001 male 5-14 years 6 321556 1.87 4060758804 1451 Millenials
Europe Albania 2001 female 25-34 years 4 222771 1.80 4060758804 1451 Generation X
Europe Albania 2001 female 5-14 years 2 307356 0.65 4060758804 1451 Millenials
Europe Albania 2002 male 75+ years 4 31007 12.90 4435078648 1573 Silent
Europe Albania 2002 male 25-34 years 23 206286 11.15 4435078648 1573 Generation X
Europe Albania 2002 male 35-54 years 35 382139 9.16 4435078648 1573 Boomers
Europe Albania 2002 male 55-74 years 13 198130 6.56 4435078648 1573 Silent
Europe Albania 2002 male 15-24 years 15 263067 5.70 4435078648 1573 Millenials
Europe Albania 2002 female 15-24 years 14 275970 5.07 4435078648 1573 Millenials
Europe Albania 2002 female 35-54 years 15 375113 4.00 4435078648 1573 Boomers
Europe Albania 2002 female 25-34 years 7 223685 3.13 4435078648 1573 Generation X
Europe Albania 2002 female 75+ years 1 47407 2.11 4435078648 1573 Silent
Europe Albania 2002 female 55-74 years 4 191712 2.09 4435078648 1573 Silent
Europe Albania 2002 female 5-14 years 1 304850 0.33 4435078648 1573 Millenials
Europe Albania 2002 male 5-14 years 1 319473 0.31 4435078648 1573 Millenials
Europe Albania 2003 female 75+ years 6 49088 12.22 5746945913 2021 Silent
Europe Albania 2003 male 55-74 years 16 201520 7.94 5746945913 2021 Silent
Europe Albania 2003 male 35-54 years 28 386196 7.25 5746945913 2021 Boomers
Europe Albania 2003 male 15-24 years 15 273235 5.49 5746945913 2021 Millenials
Europe Albania 2003 female 15-24 years 14 283709 4.93 5746945913 2021 Millenials
Europe Albania 2003 female 55-74 years 9 195699 4.60 5746945913 2021 Silent
Europe Albania 2003 male 25-34 years 9 205433 4.38 5746945913 2021 Generation X
Europe Albania 2003 female 25-34 years 9 222941 4.04 5746945913 2021 Generation X
Europe Albania 2003 female 35-54 years 13 381760 3.41 5746945913 2021 Boomers
Europe Albania 2003 male 75+ years 1 32667 3.06 5746945913 2021 Silent
Europe Albania 2003 male 5-14 years 4 313204 1.28 5746945913 2021 Millenials
Europe Albania 2003 female 5-14 years 0 298477 0.00 5746945913 2021 Millenials
Europe Albania 2004 male 75+ years 4 35526 11.26 7314865176 2544 Silent
Europe Albania 2004 male 35-54 years 39 391767 9.95 7314865176 2544 Boomers
Europe Albania 2004 male 25-34 years 16 203938 7.85 7314865176 2544 Generation X
Europe Albania 2004 female 15-24 years 20 292268 6.84 7314865176 2544 Millenials
Europe Albania 2004 male 15-24 years 19 286768 6.63 7314865176 2544 Millenials
Europe Albania 2004 female 75+ years 3 50970 5.89 7314865176 2544 Silent
Europe Albania 2004 female 25-34 years 11 222389 4.95 7314865176 2544 Generation X
Europe Albania 2004 male 55-74 years 10 207202 4.83 7314865176 2544 Silent
Europe Albania 2004 female 35-54 years 17 391436 4.34 7314865176 2544 Boomers
Europe Albania 2004 female 55-74 years 3 203841 1.47 7314865176 2544 Silent
Europe Albania 2004 female 5-14 years 3 286705 1.05 7314865176 2544 Millenials
Europe Albania 2004 male 5-14 years 1 302181 0.33 7314865176 2544 Millenials
Europe Albania 2005 female 15-24 years 0 281922 0.00 8158548717 2931 Millenials
Europe Albania 2005 female 25-34 years 0 190745 0.00 8158548717 2931 Generation X
Europe Albania 2005 female 35-54 years 0 386513 0.00 8158548717 2931 Boomers
Europe Albania 2005 female 5-14 years 0 276559 0.00 8158548717 2931 Millenials
Europe Albania 2005 female 55-74 years 0 210998 0.00 8158548717 2931 Silent
Europe Albania 2005 female 75+ years 0 53191 0.00 8158548717 2931 Silent
Europe Albania 2005 male 15-24 years 0 281675 0.00 8158548717 2931 Millenials
Europe Albania 2005 male 25-34 years 0 177519 0.00 8158548717 2931 Generation X
Summary
glimpse(suicide)
## Observations: 27,820
## Variables: 11
## $ Continent        <chr> "Europe", "Europe", "Europe", "Europe", "Euro...
## $ Country          <chr> "Albania", "Albania", "Albania", "Albania", "...
## $ Year             <int> 1987, 1987, 1987, 1987, 1987, 1987, 1987, 198...
## $ Sex              <chr> "male", "male", "female", "male", "male", "fe...
## $ Age              <chr> "15-24 years", "35-54 years", "15-24 years", ...
## $ Suicides_no      <int> 21, 16, 14, 1, 9, 1, 6, 4, 1, 0, 0, 0, 2, 17,...
## $ Population       <int> 312900, 308000, 289700, 21800, 274300, 35600,...
## $ Suicide_per_100k <dbl> 6.71, 5.19, 4.83, 4.59, 3.28, 2.81, 2.15, 1.5...
## $ Gdp_for_year     <dbl> 2156624900, 2156624900, 2156624900, 215662490...
## $ Gdp_per_capita   <int> 796, 796, 796, 796, 796, 796, 796, 796, 796, ...
## $ Generation       <chr> "Generation X", "Silent", "Generation X", "G....
summary(suicide)
##   Continent           Country               Year          Sex           
##  Length:27820       Length:27820       Min.   :1985   Length:27820      
##  Class :character   Class :character   1st Qu.:1995   Class :character  
##  Mode  :character   Mode  :character   Median :2002   Mode  :character  
##                                        Mean   :2001                     
##                                        3rd Qu.:2008                     
##                                        Max.   :2016                     
##      Age             Suicides_no        Population       Suicide_per_100k
##  Length:27820       Min.   :    0.0   Min.   :     278   Min.   :  0.00  
##  Class :character   1st Qu.:    3.0   1st Qu.:   97498   1st Qu.:  0.92  
##  Mode  :character   Median :   25.0   Median :  430150   Median :  5.99  
##                     Mean   :  242.6   Mean   : 1844794   Mean   : 12.82  
##                     3rd Qu.:  131.0   3rd Qu.: 1486143   3rd Qu.: 16.62  
##                     Max.   :22338.0   Max.   :43805214   Max.   :224.97  
##   Gdp_for_year       Gdp_per_capita    Generation       
##  Min.   :4.692e+07   Min.   :   251   Length:27820      
##  1st Qu.:8.985e+09   1st Qu.:  3447   Class :character  
##  Median :4.811e+10   Median :  9372   Mode  :character  
##  Mean   :4.456e+11   Mean   : 16866                     
##  3rd Qu.:2.602e+11   3rd Qu.: 24874                     
##  Max.   :1.812e+13   Max.   :126352

Suicide

Throughtout The World

mapped <- joinCountryData2Map(suicide, joinCode="NAME", nameJoinColumn="Country")
## 27508 codes from your data successfully matched countries in the map
## 312 codes from your data failed to match with a country code in the map
## 144 codes from the map weren't represented in your data
mapCountryData(mapped, nameColumnToPlot="Suicides_no", mapTitle="Suicide Throughtout The World", catMethod = "pretty", colourPalette = "rainbow")
## You asked for 7 categories, 5 were used due to pretty() classification

From this perspective, it seems as though continents with better economies have more suicide rates. However, later in this research we will prove whether or not this assumption is true.

Another look of suicide rates by continents

ggplot(suicide, aes(x=Continent, y=Suicide_per_100k, fill=Continent)) + geom_bar(stat="identity")+theme_minimal() + scale_fill_brewer(palette="Set3")

As seen in the graph above, Europe seems to have the leading suicide rates count followed the Americas while Africa has the least, which is depicted in the world map plot prior to this plot. On the other hand, this does not necessarily mean that people hardly commit suicide in Africa, but it could mean some cases were not accounted for as well.

By Country (Average)

suicide2 <- suicide

data <- suicide2 %>% 
  dplyr::group_by(Country) %>% 
  summarise(Suiciderates = mean(Suicide_per_100k))

data$Country <- factor(data$Country, levels = rev( data$Country[order(-data$Suiciderates)]))


ggplot(data, aes(x=Country, y=Suiciderates, fill = Country)) + geom_bar(stat="identity")+ theme_minimal() + theme(axis.text=element_text(size=6)) + theme(legend.position = "none") + theme(axis.text.x = element_text(angle = 90, hjust = 1)) 

Highest and lowest suicide rates
countries <- suicide %>% 
  dplyr::select(Country, Suicide_per_100k) %>% 
  dplyr::group_by(Country) %>% 
  dplyr::summarise(mean_suicide = mean(Suicide_per_100k)) %>% 
  dplyr::arrange(desc(mean_suicide)) %>% 
  data.frame()
## Warning: package 'bindrcpp' was built under R version 3.5.2
kable(c(head(countries, 5), tail(countries, 5))) %>% kable_styling(bootstrap_options = "striped" ,font_size = 10)
x
Lithuania
Sri Lanka
Russian Federation
Hungary
Belarus
x
40.41557
35.29515
34.89238
32.76152
31.07591
x
Oman
Antigua and Barbuda
Jamaica
Dominica
Saint Kitts and Nevis
x
0.7361111
0.5529012
0.5217647
0.0000000
0.0000000

Over The Years

average <- (sum(as.numeric(suicide$Suicides_no)) / sum(as.numeric(suicide$Population))) * 100000

suicide %>%
  group_by(Year) %>%
  summarise(population = sum(Population), 
            suicides = sum(Suicides_no), 
            suicides_per_100k = (suicides / population) * 100000) %>%
  ggplot(aes(x = Year, y = suicides_per_100k)) + 
  geom_line(aes(color = Year), size = 1) + 
  geom_point(size = 2) +
  geom_hline(yintercept = average, size = 1, linetype = 2) + 
  labs(title = "Global Suicides (per 100k)1985 - 2016",
       x = "Year", 
       y = "Suicides per 100k") + 
  scale_x_continuous(breaks = seq(1985, 2016, 2)) + 
  scale_y_continuous(breaks = seq(10, 30)) + 
  theme(axis.text.x = element_text(angle = 90, hjust = 1))

Suicide rates have decreased over the years. One point that can contribute to this is the fact that people are becoming more aware and knowlegeable due to the amount of programs that are being put into place. In addition this, social media is a part of our daily lives and people are more connected than ever, therefore help is always available.

Outliers

suicide %>% 
  dplyr::group_by(Year) %>%  
  ggplot(aes(x = Year, y = Suicide_per_100k, group = Year)) + 
  geom_boxplot() + 
  labs(title = "Global Suicides (per 100k) 1985 - 2016",
       x = "Year", 
       y = "Suicides per 100k") + 
  scale_x_continuous(breaks = seq(1985, 2016, 1)) + 
  theme(axis.text.x = element_text(angle = 90, hjust = 1))

By Age

suicide %>%
  group_by(Year, Age) %>%
  summarise(population = sum(Population), 
            suicides = sum(Suicides_no), 
            suicides_per_100k = (suicides / population) * 100000) %>%
  ggplot(aes(x = Year, y = suicides_per_100k, group = Age)) + 
  geom_line(aes(color = Age), size = 1) + 
  theme(axis.text.x = element_text(angle = 90, hjust = 1)) +
   labs(title = "Global Suicides By Age 1985 -2016",
       x = "Year", 
       y = "Suicides per 100k") + 
  scale_x_continuous(breaks = seq(1985, 2016, 2))

By Generation

ggplot(suicide, aes(x = Generation, y = Suicide_per_100k, fill = Generation)) + geom_boxplot() + theme(axis.text.x = element_text(angle = 90, hjust = 1)) + theme(axis.text=element_text(size=8)) + theme(legend.position = "none")

Suicide by Gender

plot1 <- ggplot(suicide, aes(x = Sex, y = Suicide_per_100k, fill = Sex)) + geom_bar(stat = "identity")

plot2 <- ggplot(suicide, aes(x =Suicide_per_100k, fill = Sex, color = Sex )) + geom_histogram(alpha=0.5, position="identity")

grid.arrange(plot1, plot2, ncol = 2)
## `stat_bin()` using `bins = 30`. Pick better value with `binwidth`.

Less women committed suicide than men. The distibution of suicide rate globally is unimodal and right skewed or positively skewed, for both males and females.

suicide %>% filter(Sex == "female") %>% summarise(Female = mean(Suicides_no))
##     Female
## 1 112.1143
suicide %>% filter(Sex == "male") %>% summarise(Male = mean(Suicides_no))
##       Male
## 1 373.0345

Globally, on average 374 would commit suicide compared to 113 women.

Here is a look at the outliers associates with each gender for a given region.

ggplot(suicide, aes(x=Continent, y=Suicide_per_100k, fill=Sex)) +
  geom_boxplot() +
  theme_minimal() + 
  scale_fill_brewer(palette="Set3")

Overall more men are committing suicide than women.


GDP

For Countries Annually

suicide %>% ggplot(aes(x = Country, y = Gdp_per_capita, fill = Country)) + geom_bar(stat = "identity") + theme(axis.text.x = element_text(angle = 90, hjust = 1)) + theme(axis.text=element_text(size=6)) + theme(legend.position='none') + transition_time(Year) + labs(title = "Year: {frame_time}")

By Continent

meangdp = mean(suicide$Gdp_per_capita)

his <- suicide %>% 
  ggplot(aes(x = Gdp_per_capita, fill = Continent)) +
  geom_histogram(position="identity", alpha=0.7) + 
  
# Add mean lines
  geom_vline(aes(xintercept=meangdp, color= "red"), linetype="dashed") + 
  ggtitle("Distribution of GDP")

bar <- ggplot(suicide, aes(x=Continent, y=Gdp_per_capita, fill=Continent)) + geom_bar(stat="identity")+theme_minimal() + scale_fill_brewer(palette="Set3")

grid.arrange(his, bar, ncol = 2)
## `stat_bin()` using `bins = 30`. Pick better value with `binwidth`.

Notice that Europe has the most reported suicides (shown prior) and also has the highest GDP (per capita)

Richest and poorest countries (per capita)
countries2 <- suicide %>% 
  dplyr::select(Country, Gdp_per_capita) %>% 
  dplyr::group_by(Country) %>% 
  dplyr::summarise(mean_gdppc = mean(Gdp_per_capita)) %>% 
  dplyr::arrange(desc(mean_gdppc)) %>% 
  data.frame()

kable(c(head(countries2, 5), tail(countries2, 5))) %>% kable_styling(bootstrap_options = "striped" ,font_size = 10)
x
Luxembourg
Qatar
Switzerland
Norway
San Marino
x
68798.39
67756.45
62981.76
57319.60
53663.67
x
Azerbaijan
Uzbekistan
Sri Lanka
Kiribati
Kyrgyzstan
x
1005.1250
976.1818
904.2727
875.9091
720.7308

GDP and Suicide

suicide %>% 
  ggplot(aes(x = Gdp_per_capita, y = Suicide_per_100k, group = Continent)) + 
  geom_point(aes(color = Continent), size = 1) + labs(y = "Suicide", x = "GDP Per Capita")

ANOVA

Are the average suicides the same across the world?
group_by(suicide, Continent) %>%
  summarise(
    size = n(),
    mean = mean(Suicides_no, na.rm = TRUE),
    sd = sd(Suicides_no, na.rm = TRUE)
  )
## # A tibble: 5 x 4
##   Continent  size  mean     sd
##   <chr>     <int> <dbl>  <dbl>
## 1 Africa      850  13.4   23.7
## 2 Americas   9214 194.   798. 
## 3 Asia       5366 271.   838. 
## 4 Europe    11418 299.  1061. 
## 5 Oceania     972  87.3  148.
# Compute the analysis of variance
aov_continents <- aov(Suicides_no ~ Continent, data = suicide)
# Summary of the analysis
summary(aov_continents)
##                Df    Sum Sq  Mean Sq F value Pr(>F)    
## Continent       4 1.300e+08 32501192   40.17 <2e-16 ***
## Residuals   27815 2.251e+10   809134                   
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

The P-Value is very small (< 0.05) and therefore we reject the null hypothesis in favor of the average suicide counts are different in each continent. There are many factors that can prove this otherwise such as age of the populous, religious orientation, race and ethnicity, education, economic status, social or ethnic customs and so on.

Part 4 - Inference

Hypothesis

\(H_0 = 0:\) There is no relationship between GDP and Suicide Rates. GDP does not affect the outcome of suicide rates.

\(H_0 \neq 0:\) There is a relationship between GDP and Suicide. Suicide Rates changes when GDP does.

Check Conditions:

  • Independence: The sample size consist of less than 10% of the population so assuming independence is reasonable.

  • Normal: The distribution is strongly skewed however, we can be lenient on the skewness because there are over 30 cases in the sample.

Building A Model

Checking for multicollinearity using Pairwise Scatterplots
plot(suicide[, -c(1, 2, 4, 5, 11)])

Looking at our plot, it does not appear that any of our quantitative predictor variables are highly correlated, or have a strong linear relationship with one another.

Correlation Matrix
C <- cor(suicide[, -c(1, 2, 4, 5, 11)])

corrplot(C, type="upper", order="hclust",
         col=brewer.pal(n=8, name="Spectral"))

Since none of the correlations are greater than 0.9, our results confirmed what was mentioned in the pairwise plot prior.

Stepwise Selection: Backward Elimination

Using the Akaike Information Criteria (AIC) from the MASS package to develop the best model for this data.

start.model <- lm(Suicide_per_100k ~ Gdp_per_capita + Continent + Country + Year + Sex + Age + Suicides_no + Population + Gdp_for_year + Generation, data = suicide)
lm.model <- lm(Suicide_per_100k ~ 1, data = suicide)
stepAIC(start.model, scope = list(upper = start.model, lower = lm.model), direction = "backward")
## Start:  AIC=141789.4
## Suicide_per_100k ~ Gdp_per_capita + Continent + Country + Year + 
##     Sex + Age + Suicides_no + Population + Gdp_for_year + Generation
## 
## 
## Step:  AIC=141789.4
## Suicide_per_100k ~ Gdp_per_capita + Country + Year + Sex + Age + 
##     Suicides_no + Population + Gdp_for_year + Generation
## 
##                   Df Sum of Sq     RSS    AIC
## <none>                         4509839 141789
## - Gdp_for_year     1       708 4510547 141792
## - Year             1      1731 4511571 141798
## - Generation       5      8308 4518147 141831
## - Gdp_per_capita   1     12214 4522053 141863
## - Population       1     36665 4546504 142013
## - Age              5    177620 4687460 142854
## - Suicides_no      1    292368 4802208 143535
## - Sex              1   1181868 5691708 148263
## - Country        100   1838596 6348435 151102
## 
## Call:
## lm(formula = Suicide_per_100k ~ Gdp_per_capita + Country + Year + 
##     Sex + Age + Suicides_no + Population + Gdp_for_year + Generation, 
##     data = suicide)
## 
## Coefficients:
##                         (Intercept)                       Gdp_per_capita  
##                           1.239e+02                           -9.423e-05  
##          CountryAntigua and Barbuda                     CountryArgentina  
##                          -2.306e+00                            8.227e+00  
##                      CountryArmenia                         CountryAruba  
##                          -1.201e-01                            8.314e+00  
##                    CountryAustralia                       CountryAustria  
##                           1.213e+01                            2.282e+01  
##                   CountryAzerbaijan                       CountryBahamas  
##                          -1.565e+00                           -8.137e-03  
##                      CountryBahrain                      CountryBarbados  
##                          -4.842e-02                            2.802e-01  
##                      CountryBelarus                       CountryBelgium  
##                           2.694e+01                            2.004e+01  
##                       CountryBelize        CountryBosnia and Herzegovina  
##                           2.871e+00                            1.976e+00  
##                       CountryBrazil                      CountryBulgaria  
##                           8.907e+00                            1.592e+01  
##                   CountryCabo Verde                        CountryCanada  
##                           8.179e+00                            1.135e+01  
##                        CountryChile                      CountryColombia  
##                           7.667e+00                            3.372e+00  
##                   CountryCosta Rica                       CountryCroatia  
##                           3.835e+00                            2.012e+01  
##                         CountryCuba                        CountryCyprus  
##                           1.792e+01                            2.480e+00  
##               CountryCzech Republic                       CountryDenmark  
##                           1.581e+01                            1.514e+01  
##                     CountryDominica                       CountryEcuador  
##                          -4.703e+00                            3.178e+00  
##                  CountryEl Salvador                       CountryEstonia  
##                           7.230e+00                            2.477e+01  
##                         CountryFiji                       CountryFinland  
##                           2.163e+00                            2.209e+01  
##                       CountryFrance                       CountryGeorgia  
##                           1.883e+01                            9.408e-01  
##                      CountryGermany                        CountryGreece  
##                           1.426e+01                            2.252e+00  
##                      CountryGrenada                     CountryGuatemala  
##                          -1.058e+00                            1.195e-01  
##                       CountryGuyana                       CountryHungary  
##                           1.856e+01                            2.927e+01  
##                      CountryIceland                       CountryIreland  
##                           1.264e+01                            1.016e+01  
##                       CountryIsrael                         CountryItaly  
##                           7.449e+00                            8.320e+00  
##                      CountryJamaica                         CountryJapan  
##                          -2.927e+00                            1.565e+01  
##                   CountryKazakhstan                      CountryKiribati  
##                           2.650e+01                            2.697e+00  
##                       CountryKuwait                    CountryKyrgyzstan  
##                           1.863e-01                            1.075e+01  
##                       CountryLatvia                     CountryLithuania  
##                           2.646e+01                            3.734e+01  
##                   CountryLuxembourg                         CountryMacau  
##                           1.919e+01                            1.187e+01  
##                     CountryMaldives                         CountryMalta  
##                          -1.548e+00                            2.541e+00  
##                    CountryMauritius                        CountryMexico  
##                           8.356e+00                            5.392e+00  
##                     CountryMongolia                    CountryMontenegro  
##                           1.371e+01                            6.967e+00  
##                  CountryNetherlands                   CountryNew Zealand  
##                           1.021e+01                            1.259e+01  
##                    CountryNicaragua                        CountryNorway  
##                           3.692e+00                            1.448e+01  
##                         CountryOman                        CountryPanama  
##                          -3.870e-01                            2.828e+00  
##                     CountryParaguay                   CountryPhilippines  
##                           8.572e-01                            2.761e+00  
##                       CountryPoland                      CountryPortugal  
##                           1.200e+01                            8.924e+00  
##                  CountryPuerto Rico                         CountryQatar  
##                           8.152e+00                            4.766e+00  
##            CountryRepublic of Korea                       CountryRomania  
##                           2.147e+01                            9.315e+00  
##           CountryRussian Federation         CountrySaint Kitts and Nevis  
##                           2.010e+01                           -3.864e+00  
##                  CountrySaint Lucia  CountrySaint Vincent and Grenadines  
##                           3.921e+00                            2.530e+00  
##                   CountrySan Marino                        CountrySerbia  
##                           5.446e+00                            1.901e+01  
##                   CountrySeychelles                     CountrySingapore  
##                           4.934e+00                            1.681e+01  
##                     CountrySlovakia                      CountrySlovenia  
##                           9.968e+00                            2.597e+01  
##                 CountrySouth Africa                         CountrySpain  
##                           5.425e-01                            8.369e+00  
##                    CountrySri Lanka                      CountrySuriname  
##                           3.025e+01                            1.789e+01  
##                       CountrySweden                   CountrySwitzerland  
##                           1.462e+01                            2.120e+01  
##                     CountryThailand           CountryTrinidad and Tobago  
##                           5.362e+00                            1.074e+01  
##                       CountryTurkey                  CountryTurkmenistan  
##                           3.713e+00                            5.208e+00  
##                      CountryUkraine          CountryUnited Arab Emirates  
##                           2.074e+01                            2.260e+00  
##               CountryUnited Kingdom                 CountryUnited States  
##                           7.678e+00                            1.185e+01  
##                      CountryUruguay                    CountryUzbekistan  
##                           1.634e+01                            5.146e+00  
##                                Year                              Sexmale  
##                          -6.534e-02                            1.338e+01  
##                      Age25-34 years                       Age35-54 years  
##                           2.969e+00                            5.502e+00  
##                       Age5-14 years                       Age55-74 years  
##                          -8.061e+00                            7.195e+00  
##                        Age75+ years                          Suicides_no  
##                           1.480e+01                            5.380e-03  
##                          Population                         Gdp_for_year  
##                          -7.596e-07                            3.170e-13  
##           GenerationG.I. Generation               GenerationGeneration X  
##                           7.106e-01                            3.763e-01  
##              GenerationGeneration Z                 GenerationMillenials  
##                           2.202e+00                            4.699e-01  
##                    GenerationSilent  
##                          -8.903e-01
summary(lm.model)
## 
## Call:
## lm(formula = Suicide_per_100k ~ 1, data = suicide)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -12.816 -11.896  -6.826   3.804 212.154 
## 
## Coefficients:
##             Estimate Std. Error t value Pr(>|t|)    
## (Intercept)  12.8161     0.1137   112.7   <2e-16 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 18.96 on 27819 degrees of freedom

This model is statistically significant as the P-Value is less than 0.05.

Closer Look at GDP and Suicide

To get a less populated view of the scatterplot I grouped the data by country and continent. Every country is plotted into this graph.

suicide %>%
  group_by(Country, Continent) %>%
  summarise(population = sum(as.numeric(Population)), 
            suicides = sum(as.numeric(Suicides_no)), 
            suicides_per_100k = (suicides / population) * 100000,
            gdp_per_capita = mean(Gdp_per_capita)) %>%
  ggplot(aes(x = gdp_per_capita, y = suicides_per_100k, group = Continent)) + 
  geom_point(aes(color = Continent), size = 1) + labs(y = "Suicide Rates", x = "GDP Per Capita") + geom_smooth(method=lm, linetype="dashed", color="darkgreen", fill="orange", aes(group = 1))

Correlation:

cor.test(suicide$Gdp_per_capita, suicide$Suicide_per_100k) 
## 
##  Pearson's product-moment correlation
## 
## data:  suicide$Gdp_per_capita and suicide$Suicide_per_100k
## t = 0.29774, df = 27818, p-value = 0.7659
## alternative hypothesis: true correlation is not equal to 0
## 95 percent confidence interval:
##  -0.009966025  0.013535799
## sample estimates:
##         cor 
## 0.001785134

Very weak and positive trend but with the p-value being very big makes the correlation insignificant.

mod <- lm(Suicide_per_100k ~ Gdp_per_capita, data = suicide)
summary(mod)
## 
## Call:
## lm(formula = Suicide_per_100k ~ Gdp_per_capita, data = suicide)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -13.012 -11.894  -6.827   3.802 212.152 
## 
## Coefficients:
##                 Estimate Std. Error t value Pr(>|t|)    
## (Intercept)    1.279e+01  1.524e-01  83.888   <2e-16 ***
## Gdp_per_capita 1.792e-06  6.019e-06   0.298    0.766    
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 18.96 on 27818 degrees of freedom
## Multiple R-squared:  3.187e-06,  Adjusted R-squared:  -3.276e-05 
## F-statistic: 0.08865 on 1 and 27818 DF,  p-value: 0.7659

Equation:

\[ \hat{S} = 12.78587 + 0.000001792 * gdp\_per\_captia \]

layout(matrix(c(1,2,3,4),2,2))
plot(mod)

moddf <- broom::augment(mod)
ggplot(moddf, aes(x = .resid)) + geom_histogram()
## `stat_bin()` using `bins = 30`. Pick better value with `binwidth`.

The dignostic plots above shows that the model does not appear to meet the criteria for linearity, normality and constant variance.

The hypothesis test concludes that the correlation is not significantly different from zero since the p-value is huge (>0.05) so \(H_0\) is not rejected. In other words, there is insufficient evidence to conlude that there is a significant relationship between GDP and suicide. Therefore, we cannot use the linear regression line to model between GDP and suicide in the population. Even though the line in the scatterplot shows a weak but linear trend, it may not be appropriate or relaible for prediction outside the domain of the observed GDP values in the dataset.

Part 5 - Conclusion

Suicide occurs throughout the world, affecting individuals of all nations, cultures, religions, genders and classes. Moreover, statistics show that the countries with the highest suicide rates in the world are unbelievably diverse. For instance, in this report, among the top five are the eastern European country of Lithuania with 40.42 suicides per 100k, the eastern European country of Russia (34.89 suicides per 100k) as well as Sri Lanka, Hungary and Belarus with 34.89, 32.76 and 31.08 respectively.

Contrarily, many of the most troubled nations in the world have comparatively low suicide rates such as Mexico has 4.71 suicides per 100k. Maybe the people are concerned with trying to survive. It is not clear if the suicide statistics for these countries reflect suicides committed due to mental health problems, terminal illnesses or conflicts within the country which are all plausible factors for committing suicide. The islands of the Caribbean seem to have the lowest suicide rates especially in Antigua and Barbuda(0.55), Jamaica(0.52), Dominica (0) and St. Kitts and Nevis (0).

Although GDP does not directly impact suicide, it could be one of the many circumstantial reasons as it relates to economic aspects of a country. However, based on this analysis, it is statistically insignificant to support that GDP per capita of a country contributes to suicide. There are a lot of reasons why people commit suicide and not every one is willing to talk about how they feel but it is our job as a community to observe and lend a helping hand.

For future reseach, studies can be done to attest why countries with best economic standings may have higher suicide rates compared to those in poor or developing countries.

References

Diez, David M., et al. OpenIntro Statistics. OpenIntro, 2016.

Lee, Lindsay, et al. “Suicide.” Our World in Data, 15 June 2015, [ourworldindata.org/suicide.

“Suicide in America: Frequently Asked Questions.” National Institute of Mental Health, U.S. Department of Health and Human Services, www.nimh.nih.gov/health/publications/suicide-faq/index.shtml#pub9.