General Analysis of Global suicide Trend From 1985 to 2016.
Introduction
Suicide presents a major challenge to public health in the United States and worldwide. It contributes to premature death, morbidity, lost productivity, and health care costs. In 2015 (the most recent year of available death data), suicide was responsible for 44,193 deaths in the U.S., which is approximately one suicide every 12 minutes. In 2015, suicide ranked as the 10th leading cause of death and has been among the top 12 leading causes of death since 1975 in the U.S. Overall suicide rates increased 28% from 2000 to 2015. Suicide is a problem throughout the life span; it is the third leading cause of death for youth 10–14 years of age, the second leading cause of death among people 15–24 and 25–34 years of age; the fourth leading cause among people 35 to 44 years of age, the fifth leading cause among people ages 45–54 and eighth leading cause among people 55–64 years of age.
There are a number of packages we have to load to begin analyzing and visualizing our dataset
#loading certain packages
pacman::p_load(tidyverse,tmap, tmaptools, leaflet,sf, rio, tmap,tmaptools,leaflet.extras,dplyr,sp,treemap,RColorBrewer,factoextra, knitr, kableExtra,
highcharter, ggthemes, ggcorrplot)
#Set highcharter options
options(highcharter.theme = hc_theme_smpl(tooltip = list(valueDecimals = 2)))
Setting the directory of our dataset
setwd("C:/Users/clovi/OneDrive/Desktop/DATA 110")
suicide_data <- read_csv("suicide.csv")
##
## -- Column specification --------------------------------------------------------
## cols(
## country = col_character(),
## year = col_double(),
## sex = col_character(),
## age = col_character(),
## suicides_no = col_double(),
## population = col_double(),
## `suicides/100k pop` = col_double(),
## `country-year` = col_character(),
## `HDI for year` = col_double(),
## `gdp_for_year ($)` = col_number(),
## `gdp_per_capita ($)` = col_double(),
## generation = col_character()
## )
suicide_data
## # A tibble: 27,820 x 12
## country year sex age suicides_no population `suicides/100k pop`
## <chr> <dbl> <chr> <chr> <dbl> <dbl> <dbl>
## 1 Albania 1987 male 15-24 years 21 312900 6.71
## 2 Albania 1987 male 35-54 years 16 308000 5.19
## 3 Albania 1987 female 15-24 years 14 289700 4.83
## 4 Albania 1987 male 75+ years 1 21800 4.59
## 5 Albania 1987 male 25-34 years 9 274300 3.28
## 6 Albania 1987 female 75+ years 1 35600 2.81
## 7 Albania 1987 female 35-54 years 6 278800 2.15
## 8 Albania 1987 female 25-34 years 4 257200 1.56
## 9 Albania 1987 male 55-74 years 1 137500 0.73
## 10 Albania 1987 female 5-14 years 0 311000 0
## # ... with 27,810 more rows, and 5 more variables: country-year <chr>,
## # HDI for year <dbl>, gdp_for_year ($) <dbl>, gdp_per_capita ($) <dbl>,
## # generation <chr>
#Removing unnecessary columns #Renaming the variable names
suicide_df <- suicide_data %>%
select(-c('HDI for year', 'suicides/100k pop','country-year')) %>%
rename(gdp_per_year = 'gdp_for_year ($)',
gdp_per_capital = 'gdp_per_capita ($)') %>%
as.data.frame()
str(suicide_df)
## 'data.frame': 27820 obs. of 9 variables:
## $ country : chr "Albania" "Albania" "Albania" "Albania" ...
## $ year : num 1987 1987 1987 1987 1987 ...
## $ sex : chr "male" "male" "female" "male" ...
## $ age : chr "15-24 years" "35-54 years" "15-24 years" "75+ years" ...
## $ suicides_no : num 21 16 14 1 9 1 6 4 1 0 ...
## $ population : num 312900 308000 289700 21800 274300 ...
## $ gdp_per_year : num 2.16e+09 2.16e+09 2.16e+09 2.16e+09 2.16e+09 ...
## $ gdp_per_capital: num 796 796 796 796 796 796 796 796 796 796 ...
## $ generation : chr "Generation X" "Silent" "Generation X" "G.I. Generation" ...
#data is not available for some countries and some countries don't have data entirely for 2016, so I will suggest we remove the year 2016
cleaned_suicide_df <- suicide_df %>%
filter(year != 2016)
World suicide rate over time #this is for all years
Worlds_rate <- (sum(as.numeric(cleaned_suicide_df$suicides_no)) / sum(as.numeric(cleaned_suicide_df$population))) * 10^5
unique(cleaned_suicide_df$year) %>% sort()
## [1] 1985 1986 1987 1988 1989 1990 1991 1992 1993 1994 1995 1996 1997 1998 1999
## [16] 2000 2001 2002 2003 2004 2005 2006 2007 2008 2009 2010 2011 2012 2013 2014
## [31] 2015
By year
df1 <- cleaned_suicide_df %>% group_by(year) %>% summarize(rate = ((sum(suicides_no))/sum(population))*10^5)
Suicide rate by year
df1
## # A tibble: 31 x 2
## year rate
## <dbl> <dbl>
## 1 1985 11.5
## 2 1986 11.7
## 3 1987 11.6
## 4 1988 11.5
## 5 1989 13.1
## 6 1990 13.2
## 7 1991 13.3
## 8 1992 13.5
## 9 1993 14.5
## 10 1994 15.0
## # ... with 21 more rows
Mean suicide rate in those years
mean(df1$rate)
## [1] 13.12023
cleaned_suicide_df %>%
group_by(age) %>%
summarise(suicide_per_100k = (sum(suicides_no) / sum(population)) * 10^5)
## # A tibble: 6 x 2
## age suicide_per_100k
## <chr> <dbl>
## 1 15-24 years 9.36
## 2 25-34 years 13.3
## 3 35-54 years 17.1
## 4 5-14 years 0.622
## 5 55-74 years 18.9
## 6 75+ years 24.5
By country, population and suicide number
dataframe <- cleaned_suicide_df %>% group_by(country, population,suicides_no,year) %>%
summarize(Rate = ((sum(suicides_no))/sum(population))*10^5)
## `summarise()` has grouped output by 'country', 'population', 'suicides_no'. You can override using the `.groups` argument.
dataframe
## # A tibble: 27,628 x 5
## # Groups: country, population, suicides_no [27,401]
## country population suicides_no year Rate
## <chr> <dbl> <dbl> <dbl> <dbl>
## 1 Albania 21800 1 1987 4.59
## 2 Albania 22300 1 1988 4.48
## 3 Albania 22500 2 1989 8.89
## 4 Albania 23900 0 1992 0
## 5 Albania 24200 1 1993 4.13
## 6 Albania 24600 2 1994 8.13
## 7 Albania 24900 1 2000 4.02
## 8 Albania 25100 1 1995 3.98
## 9 Albania 25400 2 1996 7.87
## 10 Albania 25400 3 1997 11.8
## # ... with 27,618 more rows
Group by Country
df2<- cleaned_suicide_df %>%
group_by(country) %>%
summarize(Rate = ((sum(suicides_no))/sum(population))*10^5)
df2
## # A tibble: 100 x 2
## country Rate
## <chr> <dbl>
## 1 Albania 3.16
## 2 Antigua and Barbuda 0.553
## 3 Argentina 7.94
## 4 Armenia 2.45
## 5 Aruba 8.02
## 6 Australia 12.9
## 7 Austria 20.7
## 8 Azerbaijan 1.48
## 9 Bahamas 1.42
## 10 Bahrain 2.76
## # ... with 90 more rows
pacman:: p_load(sf, rnaturalearth, rnaturalearthdata, rgeos)
world <- ne_countries(scale = 'medium', returnclass='sf')
df2 <- df2 %>% arrange(country)
df2
## # A tibble: 100 x 2
## country Rate
## <chr> <dbl>
## 1 Albania 3.16
## 2 Antigua and Barbuda 0.553
## 3 Argentina 7.94
## 4 Armenia 2.45
## 5 Aruba 8.02
## 6 Australia 12.9
## 7 Austria 20.7
## 8 Azerbaijan 1.48
## 9 Bahamas 1.42
## 10 Bahrain 2.76
## # ... with 90 more rows
ndf2 <- df2$country
nw <- world$name
tf <- nw %in% ndf2
df4 <- df2 %>% rename(name = country)
world_2 <- merge(world, df4, all.x=T)
world
## Simple feature collection with 241 features and 63 fields
## Geometry type: MULTIPOLYGON
## Dimension: XY
## Bounding box: xmin: -180 ymin: -89.99893 xmax: 180 ymax: 83.59961
## CRS: +proj=longlat +datum=WGS84 +no_defs +ellps=WGS84 +towgs84=0,0,0
## First 10 features:
## scalerank featurecla labelrank sovereignt sov_a3 adm0_dif
## 0 3 Admin-0 country 5 Netherlands NL1 1
## 1 1 Admin-0 country 3 Afghanistan AFG 0
## 2 1 Admin-0 country 3 Angola AGO 0
## 3 1 Admin-0 country 6 United Kingdom GB1 1
## 4 1 Admin-0 country 6 Albania ALB 0
## 5 3 Admin-0 country 6 Finland FI1 1
## 6 3 Admin-0 country 6 Andorra AND 0
## 7 1 Admin-0 country 4 United Arab Emirates ARE 0
## 8 1 Admin-0 country 2 Argentina ARG 0
## 9 1 Admin-0 country 6 Armenia ARM 0
## level type admin adm0_a3 geou_dif
## 0 2 Country Aruba ABW 0
## 1 2 Sovereign country Afghanistan AFG 0
## 2 2 Sovereign country Angola AGO 0
## 3 2 Dependency Anguilla AIA 0
## 4 2 Sovereign country Albania ALB 0
## 5 2 Country Aland ALD 0
## 6 2 Sovereign country Andorra AND 0
## 7 2 Sovereign country United Arab Emirates ARE 0
## 8 2 Sovereign country Argentina ARG 0
## 9 2 Sovereign country Armenia ARM 0
## geounit gu_a3 su_dif subunit su_a3 brk_diff
## 0 Aruba ABW 0 Aruba ABW 0
## 1 Afghanistan AFG 0 Afghanistan AFG 0
## 2 Angola AGO 0 Angola AGO 0
## 3 Anguilla AIA 0 Anguilla AIA 0
## 4 Albania ALB 0 Albania ALB 0
## 5 Aland ALD 0 Aland ALD 0
## 6 Andorra AND 0 Andorra AND 0
## 7 United Arab Emirates ARE 0 United Arab Emirates ARE 0
## 8 Argentina ARG 0 Argentina ARG 0
## 9 Armenia ARM 0 Armenia ARM 0
## name name_long brk_a3 brk_name
## 0 Aruba Aruba ABW Aruba
## 1 Afghanistan Afghanistan AFG Afghanistan
## 2 Angola Angola AGO Angola
## 3 Anguilla Anguilla AIA Anguilla
## 4 Albania Albania ALB Albania
## 5 Aland Aland Islands ALD Aland
## 6 Andorra Andorra AND Andorra
## 7 United Arab Emirates United Arab Emirates ARE United Arab Emirates
## 8 Argentina Argentina ARG Argentina
## 9 Armenia Armenia ARM Armenia
## brk_group abbrev postal formal_en formal_fr note_adm0
## 0 <NA> Aruba AW Aruba <NA> Neth.
## 1 <NA> Afg. AF Islamic State of Afghanistan <NA> <NA>
## 2 <NA> Ang. AO People's Republic of Angola <NA> <NA>
## 3 <NA> Ang. AI <NA> <NA> U.K.
## 4 <NA> Alb. AL Republic of Albania <NA> <NA>
## 5 <NA> Aland AI Åland Islands <NA> Fin.
## 6 <NA> And. AND Principality of Andorra <NA> <NA>
## 7 <NA> U.A.E. AE United Arab Emirates <NA> <NA>
## 8 <NA> Arg. AR Argentine Republic <NA> <NA>
## 9 <NA> Arm. ARM Republic of Armenia <NA> <NA>
## note_brk name_sort name_alt mapcolor7 mapcolor8 mapcolor9
## 0 <NA> Aruba <NA> 4 2 2
## 1 <NA> Afghanistan <NA> 5 6 8
## 2 <NA> Angola <NA> 3 2 6
## 3 <NA> Anguilla <NA> 6 6 6
## 4 <NA> Albania <NA> 1 4 1
## 5 <NA> Aland <NA> 4 1 4
## 6 <NA> Andorra <NA> 1 4 1
## 7 <NA> United Arab Emirates <NA> 2 1 3
## 8 <NA> Argentina <NA> 3 1 3
## 9 <NA> Armenia <NA> 3 1 2
## mapcolor13 pop_est gdp_md_est pop_year lastcensus gdp_year
## 0 9 103065 2258.0 NA 2010 NA
## 1 7 28400000 22270.0 NA 1979 NA
## 2 1 12799293 110300.0 NA 1970 NA
## 3 3 14436 108.9 NA NA NA
## 4 6 3639453 21810.0 NA 2001 NA
## 5 6 27153 1563.0 NA NA NA
## 6 8 83888 3660.0 NA 1989 NA
## 7 3 4798491 184300.0 NA 2010 NA
## 8 13 40913584 573900.0 NA 2010 NA
## 9 10 2967004 18770.0 NA 2001 NA
## economy income_grp wikipedia fips_10 iso_a2
## 0 6. Developing region 2. High income: nonOECD NA <NA> AW
## 1 7. Least developed region 5. Low income NA <NA> AF
## 2 7. Least developed region 3. Upper middle income NA <NA> AO
## 3 6. Developing region 3. Upper middle income NA <NA> AI
## 4 6. Developing region 4. Lower middle income NA <NA> AL
## 5 2. Developed region: nonG7 1. High income: OECD NA <NA> AX
## 6 2. Developed region: nonG7 2. High income: nonOECD NA <NA> AD
## 7 6. Developing region 2. High income: nonOECD NA <NA> AE
## 8 5. Emerging region: G20 3. Upper middle income NA <NA> AR
## 9 6. Developing region 4. Lower middle income NA <NA> AM
## iso_a3 iso_n3 un_a3 wb_a2 wb_a3 woe_id adm0_a3_is adm0_a3_us adm0_a3_un
## 0 ABW 533 533 AW ABW NA ABW ABW NA
## 1 AFG 004 004 AF AFG NA AFG AFG NA
## 2 AGO 024 024 AO AGO NA AGO AGO NA
## 3 AIA 660 660 <NA> <NA> NA AIA AIA NA
## 4 ALB 008 008 AL ALB NA ALB ALB NA
## 5 ALA 248 248 <NA> <NA> NA ALA ALD NA
## 6 AND 020 020 AD ADO NA AND AND NA
## 7 ARE 784 784 AE ARE NA ARE ARE NA
## 8 ARG 032 032 AR ARG NA ARG ARG NA
## 9 ARM 051 051 AM ARM NA ARM ARM NA
## adm0_a3_wb continent region_un subregion region_wb
## 0 NA North America Americas Caribbean Latin America & Caribbean
## 1 NA Asia Asia Southern Asia South Asia
## 2 NA Africa Africa Middle Africa Sub-Saharan Africa
## 3 NA North America Americas Caribbean Latin America & Caribbean
## 4 NA Europe Europe Southern Europe Europe & Central Asia
## 5 NA Europe Europe Northern Europe Europe & Central Asia
## 6 NA Europe Europe Southern Europe Europe & Central Asia
## 7 NA Asia Asia Western Asia Middle East & North Africa
## 8 NA South America Americas South America Latin America & Caribbean
## 9 NA Asia Asia Western Asia Europe & Central Asia
## name_len long_len abbrev_len tiny homepart geometry
## 0 5 5 5 4 NA MULTIPOLYGON (((-69.89912 1...
## 1 11 11 4 NA 1 MULTIPOLYGON (((74.89131 37...
## 2 6 6 4 NA 1 MULTIPOLYGON (((14.19082 -5...
## 3 8 8 4 NA NA MULTIPOLYGON (((-63.00122 1...
## 4 7 7 4 NA 1 MULTIPOLYGON (((20.06396 42...
## 5 5 13 5 5 NA MULTIPOLYGON (((20.61133 60...
## 6 7 7 4 5 1 MULTIPOLYGON (((1.706055 42...
## 7 20 20 6 NA 1 MULTIPOLYGON (((53.92783 24...
## 8 9 9 4 NA 1 MULTIPOLYGON (((-64.54917 -...
## 9 7 7 4 NA 1 MULTIPOLYGON (((45.55234 40...
world_2$rate %>% length()
## [1] 0
nw[!tf]
## [1] "Afghanistan" "Angola"
## [3] "Anguilla" "Aland"
## [5] "Andorra" "American Samoa"
## [7] "Antarctica" "Ashmore and Cartier Is."
## [9] "Fr. S. Antarctic Lands" "Antigua and Barb."
## [11] "Burundi" "Benin"
## [13] "Burkina Faso" "Bangladesh"
## [15] "Bosnia and Herz." "St-Barthélemy"
## [17] "Bermuda" "Bolivia"
## [19] "Brunei" "Bhutan"
## [21] "Botswana" "Central African Rep."
## [23] "China" "Côte d'Ivoire"
## [25] "Cameroon" "Dem. Rep. Congo"
## [27] "Congo" "Cook Is."
## [29] "Comoros" "Cape Verde"
## [31] "Curaçao" "Cayman Is."
## [33] "N. Cyprus" "Czech Rep."
## [35] "Djibouti" "Dominican Rep."
## [37] "Algeria" "Egypt"
## [39] "Eritrea" "Ethiopia"
## [41] "Falkland Is." "Faeroe Is."
## [43] "Micronesia" "Gabon"
## [45] "Guernsey" "Ghana"
## [47] "Guinea" "Gambia"
## [49] "Guinea-Bissau" "Eq. Guinea"
## [51] "Greenland" "Guam"
## [53] "Hong Kong" "Heard I. and McDonald Is."
## [55] "Honduras" "Haiti"
## [57] "Indonesia" "Isle of Man"
## [59] "India" "Indian Ocean Ter."
## [61] "Br. Indian Ocean Ter." "Iran"
## [63] "Iraq" "Jersey"
## [65] "Jordan" "Siachen Glacier"
## [67] "Kenya" "Cambodia"
## [69] "St. Kitts and Nevis" "Korea"
## [71] "Kosovo" "Lao PDR"
## [73] "Lebanon" "Liberia"
## [75] "Libya" "Liechtenstein"
## [77] "Lesotho" "Macao"
## [79] "St-Martin" "Morocco"
## [81] "Monaco" "Moldova"
## [83] "Madagascar" "Marshall Is."
## [85] "Macedonia" "Mali"
## [87] "Myanmar" "Mongolia"
## [89] "N. Mariana Is." "Mozambique"
## [91] "Mauritania" "Montserrat"
## [93] "Malawi" "Malaysia"
## [95] "Namibia" "New Caledonia"
## [97] "Niger" "Norfolk Island"
## [99] "Nigeria" "Niue"
## [101] "Nepal" "Nauru"
## [103] "Pakistan" "Pitcairn Is."
## [105] "Peru" "Palau"
## [107] "Papua New Guinea" "Dem. Rep. Korea"
## [109] "Palestine" "Fr. Polynesia"
## [111] "Russia" "Rwanda"
## [113] "W. Sahara" "Saudi Arabia"
## [115] "Sudan" "S. Sudan"
## [117] "Senegal" "S. Geo. and S. Sandw. Is."
## [119] "Saint Helena" "Solomon Is."
## [121] "Sierra Leone" "Somaliland"
## [123] "Somalia" "St. Pierre and Miquelon"
## [125] "São Tomé and Principe" "Swaziland"
## [127] "Sint Maarten" "Syria"
## [129] "Turks and Caicos Is." "Chad"
## [131] "Togo" "Tajikistan"
## [133] "Timor-Leste" "Tonga"
## [135] "Tunisia" "Taiwan"
## [137] "Tanzania" "Uganda"
## [139] "Vatican" "St. Vin. and Gren."
## [141] "Venezuela" "British Virgin Is."
## [143] "U.S. Virgin Is." "Vietnam"
## [145] "Vanuatu" "Wallis and Futuna Is."
## [147] "Samoa" "Yemen"
## [149] "Zambia" "Zimbabwe"
world$name %>% length()
## [1] 241
World Map showing the suicide rate in different countries
ggplot(data = world_2)+
geom_sf(aes(fill = Rate))+
scale_fill_viridis_c(option = "plasma", trans='sqrt')
Figuring out the Suicide number per one hundred thosand people by Age
#By Age
age_plot <- cleaned_suicide_df %>%
group_by(age) %>%
summarize(Rate = ((sum(suicides_no))/sum(population))*10^5) %>%
ggplot(aes(x = age, y = Rate, fill = age)) +
geom_bar(stat = "identity") +
labs(title = "Global Suicides per 100k, seen by Age",
x = "Age",
y = "Suicide per 100k") +
theme(legend.position = "right") +
scale_alpha_continuous(breaks = seq(0, 30, 1), minor_breaks = F)
age_plot
Figuring out the Suicide number per one hundred thosand people by Generation
#By Generation
cleaned_suicide_df %>%
group_by(generation) %>%
summarize(Rate = ((sum(suicides_no))/sum(population))*10^5) %>%
ggplot(aes(x = generation, y = Rate, fill = generation)) +
geom_bar(stat = "identity") +
labs(title = "World suicides per 100k, by generation",
x = "generation",
y = "Suicides per 100k") +
theme(legend.position = "right") +
scale_y_continuous(breaks = seq(1,25,3), minor_breaks = F)
cleaned_suicide_df$gdp <- gsub(",","",cleaned_suicide_df$gdp_per_year) %>%
as.numeric()
gdp <- cleaned_suicide_df %>%
group_by(country, year) %>%
summarise(occurance = n(), gdp = sum(gdp)) %>%
mutate(real_gdp = gdp/occurance)
## `summarise()` has grouped output by 'country'. You can override using the `.groups` argument.
gdp
## # A tibble: 2,305 x 5
## # Groups: country [100]
## country year occurance gdp real_gdp
## <chr> <dbl> <int> <dbl> <dbl>
## 1 Albania 1987 12 25879498800 2156624900
## 2 Albania 1988 12 25512000000 2126000000
## 3 Albania 1989 12 28021499856 2335124988
## 4 Albania 1992 12 8513431008 709452584
## 5 Albania 1993 12 14736852456 1228071038
## 6 Albania 1994 12 23828085576 1985673798
## 7 Albania 1995 12 29093988108 2424499009
## 8 Albania 1996 12 39778779504 3314898292
## 9 Albania 1997 12 28318837296 2359903108
## 10 Albania 1998 12 32485485264 2707123772
## # ... with 2,295 more rows
Producing a Treemap illustrating the Suicide rate per country where we notice that the Russian federation has the highest number of Suicides in the world
treemap(dataframe, index="country", vSize="Rate",
vColor="population",type="value",
palette="RdYlBu")
big5 <- cleaned_suicide_df %>%
filter(country == "Russian Federation" | country == "United States" | country == "United Kingdom" | country == "France" | country == "Germany") %>%
arrange(year)
# basic symbol-and-line chart, default settings
big5 %>%
group_by(country, year) %>%
summarize(Rate = ((sum(suicides_no))/sum(population))*10^5) %>%
ggplot(aes(x = year, y = Rate, col = country)) +
geom_point(alpha = 0.5) +
geom_smooth(se = F, span = 0.2) +
scale_x_continuous(breaks = seq(1985, 2015, 5), minor_breaks = F) +
labs(tiltle = "United Kingdom, France, United states, Russia, and Germany",
subtitle = "Suicides per 100k population, from 1985 to 2015",
x = "year",
y = "Suicides per 100k",
col = "Country")
## `summarise()` has grouped output by 'country'. You can override using the `.groups` argument.
## `geom_smooth()` using method = 'loess' and formula 'y ~ x'
## Warning in sqrt(sum.squares/one.delta): NaNs produced
## Warning in sqrt(sum.squares/one.delta): NaNs produced
Boxplot for the different Generation in the United states
#filtering data only for the united states
cleaned_suicide_df_US <- cleaned_suicide_df %>%
filter(country == "United States")
Plotting the boxplot for the US data
US_boxplot<- ggplot(data = cleaned_suicide_df_US) +
geom_boxplot(mapping = aes(x = factor(generation), y = suicides_no,
fill = factor(generation))) +
labs(title = "Number of Suicides by Generation Boxplots",
x = "Generation", y = "Number of Suicides")
US_boxplot
The Global Trend in suicide deaths in the form of a line graph
cleaned_suicide_df %>%
group_by(year) %>%
summarize(Rate_per_10k = ((sum(suicides_no))/sum(population))*10^5) %>%
ggplot (aes(x= year, y = Rate_per_10k)) +
geom_line(col = "green", size = 2) +
geom_point(col = "green", size = 3) +
geom_hline( yintercept = Worlds_rate, linetype = 2, color = "red", size = 1) +
labs(title = "Global Suicides (per 100k)",
subtitle = "Trend over time, 1985 - 2015",
x = "year",
Y = "Suicides per 100k") +
scale_x_continuous(breaks = seq(1985, 2015, 2)) +
scale_y_continuous(breaks = seq(10,20))
A Representation of the global suicides number with respect to the age brackest
cleaned_suicide_df %>%
group_by(age, year) %>%
summarise(Total_suicides = sum(suicides_no)) %>%
ungroup() %>%
ggplot(aes(x = year, y = Total_suicides)) +
geom_line(aes(col = age), size = 1.5) +
labs(title = "Suicide graph based on age",
subtitle = "From 1985 to 2015",
x = "year",
y = "Suicides per age bracket") +
scale_x_continuous(breaks = seq(1985, 2015, 2)) +
scale_y_continuous(breaks = seq(10,20))
## `summarise()` has grouped output by 'age'. You can override using the `.groups` argument.
Ilustrating the 10 top countries with high Male suicide Trend Russia is leading the chart
cleaned_suicide_df %>%
filter(sex %in% "male") %>%
group_by(country) %>%
summarise(Total_suicides = sum(suicides_no)) %>%
top_n(10) %>%
ggplot(aes(x = Total_suicides, y = reorder(country, Total_suicides), fill = "male")) +
geom_col() +
labs(title = "Top 10 Countries with high Male Suicides trend ",
subtitle = "From 1985 to 2015",
X = "Cummulative Death",
Y = "Country") +
theme(legend.position = "right")
## Selecting by Total_suicides
cleaned_suicide_df %>%
filter(sex %in% "female") %>%
group_by(country) %>%
summarise(Total_suicides = sum(suicides_no)) %>%
top_n(10) %>%
ggplot(aes(x = Total_suicides, y = reorder(country, Total_suicides), fill = "female")) +
geom_col() +
labs(title = "Top 10 Countries with high Female Suicides trend ",
subtitle = "From 1985 to 2015",
X = "Cummulative Death",
Y = "Country") +
theme(legend.position = "right")
## Selecting by Total_suicides
General Anlysis of Global suicide Trend From 1985 to 2016.
This Suicide dataset came from the Kaggle data source and contain a list of variables some of which are country, year, sex, age, suicides_numbers, population, suicides per 100k people, country-year, HDI for year, gdp_for_year, gdp_per_capital, and the the different generations from the years 1985 to 2016. Fortunately for this student, the dataset wasn’t too messy and didn’t need much cleaning, only required the elimination or selection of certain variables such as HDI for year, country-year and suicides per 100k people which we didn’t need to use. Again, data is not available for some countries and some countries don’t have data entirely for 2016, so we removed the year 2016 entirely from the dataset. Then we still needed to find the rate of suicide per one hundred thousand people, which we did by summarizing the sums of suicides numbers divided by the sums of population multiplied by one hundred thousand. The primary reason why I chose this dataset was because I strongly believe there is a serious problem of death by suicide in the world and its not being taken seriously. I use to ask myself why would someone decide to take his or her own life, and so caught my attention to suicide rates in the world was when I saw on the news that a famous celebrity took his own life, I became curious and wanted to really know how serious this is an issue in the world today. We live in difficult times today and reports say the Covid pandemic has increased the rate of suicide, would have loved to have an up to date data on suicide numbers in the world so I could really figure out the trend today.
Suicide rates increased 33% between 1999 and 2019, with a small decline in 2019. Suicide is the 10th leading cause of death in the United States.3 It was responsible for more than 47,500 deaths in 2019, which is about one death every 11 minutes.3 The number of people who think about or attempt suicide is even higher. In 2019, 12 million American adults seriously thought about suicide, 3.5 million planned a suicide attempt, and 1.4 million attempted suicide.Suicide affects all ages and so data shows that it is the second leading cause of death for people ages 10-34, the fourth leading cause among people ages 34-54, and the fifth leading cause among people ages 45-54. Some groups have higher suicide rates than others. Suicide rates vary by race/ethnicity, age, and other factors. The highest rates are among American Indian/Alaska Native and non-Hispanic White populations. Other Americans with higher than average rates of suicide are veterans, people who live in rural areas, and workers in certain industries and occupations like mining and construction. Young people who are lesbian, gay, or bisexual have a higher rate of suicidal ideation and behavior compared to their peers who identify as straight suicidal thoughts and behaviors. There are a lot of interesting facts I found out some of which are that, globally, and for all age groups, the suicide rate is higher for men than women and is most likely to happen in countries with the most suicide numbers like Russia and the United States of America, and that the suicide rate is highest in russia than in any other country in the world. Unfortunately, there was a lot of things I would have loved to show but couldn’t do them, things such as the correlation between suicide rate and gdp per capiatal of a country, or just representing the suicide rate for a number of selected countries on a highcharter
Content sources: https://www.cdc.gov/, https://www.cdc.gov/suicide/pdf/suicideTechnicalPackage.pdf