library(ggplot2)
library(dplyr)
##
## Attaching package: 'dplyr'
## The following objects are masked from 'package:stats':
##
## filter, lag
## The following objects are masked from 'package:base':
##
## intersect, setdiff, setequal, union
library(tidyr)
library(tidyverse)
## ── Attaching core tidyverse packages ──────────────────────── tidyverse 2.0.0 ──
## ✔ forcats 1.0.0 ✔ readr 2.1.4
## ✔ lubridate 1.9.3 ✔ stringr 1.5.0
## ✔ purrr 1.0.2 ✔ tibble 3.2.1
## ── Conflicts ────────────────────────────────────────── tidyverse_conflicts() ──
## ✖ dplyr::filter() masks stats::filter()
## ✖ dplyr::lag() masks stats::lag()
## ℹ Use the conflicted package (<http://conflicted.r-lib.org/>) to force all conflicts to become errors
Bringing awareness to suicide and suicidal thoughts is very dear to my heart. Growing up in Brooklyn, NY suicide was always forbidden fruit to. Granted, it could have been because the people I were around just did not take mental health seriously. It wasn’t until after I became comfortable enough to admit that my depression left me in a suicidal state and I had thoughts of ending it all after the loss of a loved one, that I started to take mental health my seriously, seek help, and educate the people around me.
Suicide has not only affected me personally, but is one of the biggest problems in the world and has increased a significant amount around the world annually. There are a lot of factors that play a role in why people attempt and commit suicide such as mental health issues, suffering from a loss, and chronic illnesses just to name a few. Despite there being an ample amount of factors to why people contemplate or commit suicide, I will be taking few variables from the data set and I will check their relationship with suicide numbers throughout the world. I will be analyzing both men and women from mostly all age groups with I hope that this project will be able to shed light on suicide awareness and show people that it is okay to be vulnerable and seek help. We should not have to take on all of our struggles alone, sometimes it is okay for the “strong friend” to ask for help when it is needed.
I found the data used for this project on Kaggle. This data set can be downloaded free of charge from https: https://www.kaggle.com/data sets/russellyates88/suicide-rates-overview-1985-to-2016. This compiled data set pulled from four other data sets linked by time and place, and was built to find signals correlated to increased suicide rates among different cohorts globally, across the socio-economic spectrum.
data<- read.csv('/Users/MECCA/Desktop/master suicide.csv')
summary(data)
## country year sex age
## Length:27820 Min. :1985 Length:27820 Length:27820
## Class :character 1st Qu.:1995 Class :character Class :character
## Mode :character Median :2002 Mode :character Mode :character
## Mean :2001
## 3rd Qu.:2008
## Max. :2016
##
## suicides_no population suicides.100k.population Year
## Min. : 0.0 Min. : 278 Min. : 0.00 Min. :1985
## 1st Qu.: 3.0 1st Qu.: 97498 1st Qu.: 0.92 1st Qu.:1995
## Median : 25.0 Median : 430150 Median : 5.99 Median :2002
## Mean : 242.6 Mean : 1844794 Mean : 12.82 Mean :2001
## 3rd Qu.: 131.0 3rd Qu.: 1486143 3rd Qu.: 16.62 3rd Qu.:2008
## Max. :22338.0 Max. :43805214 Max. :224.97 Max. :2016
##
## HDI.for.year gdp_for_year.... gdp_per_capita.... generation
## Min. :0.483 Length:27820 Min. : 251 Length:27820
## 1st Qu.:0.713 Class :character 1st Qu.: 3447 Class :character
## Median :0.779 Mode :character Median : 9372 Mode :character
## Mean :0.777 Mean : 16866
## 3rd Qu.:0.855 3rd Qu.: 24874
## Max. :0.944 Max. :126352
## NA's :19456
group_by_age <- data %>% group_by(age) %>% summarise(suicides_no = sum(suicides_no))
group_by_year <- data %>% group_by(year) %>% summarise(suicides_no = sum(suicides_no))
ggplot(group_by_year, aes(x = year, y = suicides_no)) +
geom_line(color="hotpink2") +
labs(title = 'Total Suicides per Year', x = 'Year', y = 'Number of Suicides') +
theme_minimal()
ggplot(group_by_year, aes(x = reorder(year, suicides_no), y = suicides_no, fill = year)) +
geom_bar(stat = "identity") +
labs(title = 'Yearly Suicides', x = 'Year', y = 'Number of Suicides') +
theme_minimal() +
coord_flip()
year_most_suicides <- group_by_year[which.max(group_by_year$suicides_no), ]
year_least_suicides <- group_by_year[which.min(group_by_year$suicides_no), ]
list(most = year_most_suicides, least = year_least_suicides)
## $most
## # A tibble: 1 × 2
## year suicides_no
## <int> <int>
## 1 1999 256119
##
## $least
## # A tibble: 1 × 2
## year suicides_no
## <int> <int>
## 1 2016 15603
gender_suicides <- data %>%
group_by(year, sex) %>%
summarise(suicides_no = sum(suicides_no, na.rm = TRUE)) %>%
spread(key = sex, value = suicides_no)
## `summarise()` has grouped output by 'year'. You can override using the
## `.groups` argument.
From observing my line plot and bar graph, I noticed that suicide rates before 1990s were less than 150K annually. This low rate could be due to awareness of mental health and suicide in the 80s. After noticing this I decided to do more research and found out that this is accurate, as the research, “Suicide in the elderly” supports this claim”
## `summarise()` has grouped output by 'year'. You can override using the
## `.groups` argument.
year | female | male |
---|---|---|
1985 | 32479 | 83584 |
1986 | 33852 | 86818 |
1987 | 35006 | 91836 |
1988 | 33015 | 88011 |
1989 | 41361 | 118883 |
1990 | 50118 | 143243 |
1991 | 49622 | 148398 |
1992 | 51567 | 159906 |
1993 | 51331 | 170234 |
1994 | 51532 | 180531 |
1995 | 54504 | 189040 |
1996 | 54583 | 192142 |
1997 | 54126 | 186619 |
1998 | 55631 | 193960 |
1999 | 56215 | 199904 |
2000 | 55254 | 200578 |
2001 | 52999 | 197653 |
2002 | 55549 | 200546 |
2003 | 55627 | 200452 |
2004 | 53232 | 187629 |
2005 | 52035 | 182340 |
2006 | 52039 | 181322 |
2007 | 53324 | 180084 |
2008 | 53973 | 181474 |
2009 | 54920 | 188567 |
2010 | 54222 | 184480 |
2011 | 54616 | 181868 |
2012 | 53011 | 177149 |
2013 | 51459 | 171740 |
2014 | 51556 | 171428 |
2015 | 47248 | 156392 |
2016 | 3504 | 12099 |
I created this pivot table to go more into detail. This allows us to see the total amount of women vs men that committed suicide per year.
highest_suicide_country <- data %>% group_by(country) %>%
summarise(suicides_no = sum(suicides_no)) %>%
arrange(desc(suicides_no)) %>%
top_n(10, suicides_no)
ggplot(highest_suicide_country, aes(x = reorder(country, suicides_no), y = suicides_no, fill = country)) +
geom_bar(stat = "identity") +
labs(title = 'Top 10 Countries by Suicides', x = 'Country', y = 'Number of Suicides') +
theme_minimal() +
coord_flip()
groupby_country <- data %>% group_by(country) %>% summarise(suicides_no = sum(suicides_no))
group_by_population <- data %>% group_by(country) %>% summarise(population = sum(population))
print(group_by_population)
## # A tibble: 101 × 2
## country population
## <chr> <dbl>
## 1 Albania 62325467
## 2 Antigua and Barbuda 1990228
## 3 Argentina 1035985431
## 4 Armenia 77348173
## 5 Aruba 1259677
## 6 Australia 542377786
## 7 Austria 243853094
## 8 Azerbaijan 111790300
## 9 Bahamas 6557048
## 10 Bahrain 16753926
## # ℹ 91 more rows
group_by_CYSAP <- data %>% group_by(country, sex, age, population, year) %>%
summarise(suicides_no = sum(suicides_no)) %>%
arrange(desc(suicides_no)) %>%
top_n(101, suicides_no)
## `summarise()` has grouped output by 'country', 'sex', 'age', 'population'. You
## can override using the `.groups` argument.
country_most_suicides <- group_by_CYSAP[which.max(group_by_CYSAP$suicides_no), ]
country_least_suicides <- group_by_CYSAP[which.min(group_by_CYSAP$suicides_no), ]
list(most = country_most_suicides, least = country_least_suicides)
## $most
## # A tibble: 1 × 6
## # Groups: country, sex, age, population [1]
## country sex age population year suicides_no
## <chr> <chr> <chr> <int> <int> <int>
## 1 Russian Federation male 35-54 years 19044200 1994 22338
##
## $least
## # A tibble: 1 × 6
## # Groups: country, sex, age, population [1]
## country sex age population year suicides_no
## <chr> <chr> <chr> <int> <int> <int>
## 1 Albania female 15-24 years 270003 2009 0
Both the graph & the min/max function above, confirms that Albania had the lowest suicide count, while Russian Federation, had the largest suicide count. A reason the Russian Federations may have a large suicide count may be because they have a very large population (Albania have a population of 2.8 million, while Russian Federation has of population of 144.3 million). It has been reported that Russian levels of alcohol consumption plays an immense role in it’s large suicide count, but there is a lack of data to support this due to Soviet secrecy.
“Russian levels of alcohol consumption and suicide are among the highest in the world.”
ggplot(group_by_age, aes(x = age, y = suicides_no, fill = age)) +
geom_bar(stat = "identity") +
labs(title = 'Suicides by Age Group', x = 'Age Group', y = 'Number of Suicides') +
theme_minimal()
group_by_sex <- data %>% group_by(sex) %>% summarise(suicides_no = sum(suicides_no))
The bar graph shows that ages 35 through 54, have the highest suicide count. While ages 55 through 74 have the second highest suicide count. This high suicide rate in adults 35 and older can be due to the “U-Shape Happiness Curve. When people reach middle age they may review their earlier goals in the context of their achievements. For some, the realization of unmet aspirations or the perceived failure to have accomplished goals set as young adults could lead to a midlife low.
The U-shape of Happiness Across the Life Course: Expanding the Discussion
group_by_sex <- data %>% group_by(sex) %>% summarise(suicides_no = sum(suicides_no))
ggplot(group_by_sex, aes(x = sex, y = suicides_no, fill = sex)) +
geom_bar(stat = "identity") +
labs(title = 'Suicides by Gender', x = 'Gender', y = 'Number of Suicides') +
theme_minimal()
From the above chart, it shows that men are more likely to commit suicide than women. Why is that? For years boys and e=men have been told that it is not okay to cry and showing emotions make them less manly. Despite both women and men dealing with depression, women are more likely than men to seek help for it. Men take strong value in independence and purposefulness, and they sometimes believe that admitting that they need help as a sign of weakness and avoid it. Meanwhile, despite women valuing their independence that are willing to consult friends and are more likely to accept help.
Despite suicide having a decrease before the 1990’s, it is now at an all time high. Suicide is something that should be talked about more often because if more people are aware that they are not alone, they would possibly be more comfortable reaching out for help before it is too late. It is evident that middle aged men are more likely to commit suicide and the difference between men and women suicide rates are pretty alarming. Mental health is something that should not be brushed off because it is a major predictor for suicide. If you know someone who is suffering please use the resources below below.
https://www.ssmhealth.com/blogs/ssm-health-matters/october-2019/middle-aged-men-more-likely-to-die-by-suicide https://www.cambridge.org/core/journals/advances-in-psychiatric-treatment/article/suicide-in-the-elderly/A4A9F7695DCA8D9B2796453FF166B8F3 https://www.ncbi.nlm.nih.gov/pmc/articles/PMC1642767/ https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7529452/ https://www.bbc.com/future/article/20190313-why-more-men-kill-themselves-than-women