Project2

General Analysis of Global suicide Trend From 1985 to 2016.

Introduction

Suicide presents a major challenge to public health in the United States and worldwide. It contributes to premature death, morbidity, lost productivity, and health care costs. In 2015 (the most recent year of available death data), suicide was responsible for 44,193 deaths in the U.S., which is approximately one suicide every 12 minutes. In 2015, suicide ranked as the 10th leading cause of death and has been among the top 12 leading causes of death since 1975 in the U.S. Overall suicide rates increased 28% from 2000 to 2015. Suicide is a problem throughout the life span; it is the third leading cause of death for youth 10–14 years of age, the second leading cause of death among people 15–24 and 25–34 years of age; the fourth leading cause among people 35 to 44 years of age, the fifth leading cause among people ages 45–54 and eighth leading cause among people 55–64 years of age.

There are a number of packages we have to load to begin analyzing and visualizing our dataset

#loading certain packages 
pacman::p_load(tidyverse,tmap, tmaptools, leaflet,sf, rio, tmap,tmaptools,leaflet.extras,dplyr,sp,treemap,RColorBrewer,factoextra, knitr, kableExtra,
               highcharter, ggthemes, ggcorrplot)

#Set highcharter options

options(highcharter.theme = hc_theme_smpl(tooltip = list(valueDecimals = 2)))

Setting the directory of our dataset

setwd("C:/Users/clovi/OneDrive/Desktop/DATA 110")
suicide_data <- read_csv("suicide.csv")

## 
## -- Column specification --------------------------------------------------------
## cols(
##   country = col_character(),
##   year = col_double(),
##   sex = col_character(),
##   age = col_character(),
##   suicides_no = col_double(),
##   population = col_double(),
##   `suicides/100k pop` = col_double(),
##   `country-year` = col_character(),
##   `HDI for year` = col_double(),
##   `gdp_for_year ($)` = col_number(),
##   `gdp_per_capita ($)` = col_double(),
##   generation = col_character()
## )

suicide_data

## # A tibble: 27,820 x 12
##    country  year sex    age         suicides_no population `suicides/100k pop`
##    <chr>   <dbl> <chr>  <chr>             <dbl>      <dbl>               <dbl>
##  1 Albania  1987 male   15-24 years          21     312900                6.71
##  2 Albania  1987 male   35-54 years          16     308000                5.19
##  3 Albania  1987 female 15-24 years          14     289700                4.83
##  4 Albania  1987 male   75+ years             1      21800                4.59
##  5 Albania  1987 male   25-34 years           9     274300                3.28
##  6 Albania  1987 female 75+ years             1      35600                2.81
##  7 Albania  1987 female 35-54 years           6     278800                2.15
##  8 Albania  1987 female 25-34 years           4     257200                1.56
##  9 Albania  1987 male   55-74 years           1     137500                0.73
## 10 Albania  1987 female 5-14 years            0     311000                0   
## # ... with 27,810 more rows, and 5 more variables: country-year <chr>,
## #   HDI for year <dbl>, gdp_for_year ($) <dbl>, gdp_per_capita ($) <dbl>,
## #   generation <chr>

#Removing unnecessary columns #Renaming the variable names

suicide_df <- suicide_data %>% 
  select(-c('HDI for year', 'suicides/100k pop','country-year')) %>% 
  rename(gdp_per_year = 'gdp_for_year ($)',
         gdp_per_capital = 'gdp_per_capita ($)') %>% 
  as.data.frame()

str(suicide_df)

## 'data.frame':    27820 obs. of  9 variables:
##  $ country        : chr  "Albania" "Albania" "Albania" "Albania" ...
##  $ year           : num  1987 1987 1987 1987 1987 ...
##  $ sex            : chr  "male" "male" "female" "male" ...
##  $ age            : chr  "15-24 years" "35-54 years" "15-24 years" "75+ years" ...
##  $ suicides_no    : num  21 16 14 1 9 1 6 4 1 0 ...
##  $ population     : num  312900 308000 289700 21800 274300 ...
##  $ gdp_per_year   : num  2.16e+09 2.16e+09 2.16e+09 2.16e+09 2.16e+09 ...
##  $ gdp_per_capital: num  796 796 796 796 796 796 796 796 796 796 ...
##  $ generation     : chr  "Generation X" "Silent" "Generation X" "G.I. Generation" ...

#data is not available for some countries and some countries don't have data entirely for 2016, so I will suggest we remove the year 2016

cleaned_suicide_df <- suicide_df %>% 
  filter(year != 2016)

World suicide rate over time #this is for all years

Worlds_rate <- (sum(as.numeric(cleaned_suicide_df$suicides_no)) / sum(as.numeric(cleaned_suicide_df$population))) * 10^5

unique(cleaned_suicide_df$year) %>% sort()

##  [1] 1985 1986 1987 1988 1989 1990 1991 1992 1993 1994 1995 1996 1997 1998 1999
## [16] 2000 2001 2002 2003 2004 2005 2006 2007 2008 2009 2010 2011 2012 2013 2014
## [31] 2015

By year

df1 <- cleaned_suicide_df %>% group_by(year) %>% summarize(rate = ((sum(suicides_no))/sum(population))*10^5)

Suicide rate by year

df1

## # A tibble: 31 x 2
##     year  rate
##    <dbl> <dbl>
##  1  1985  11.5
##  2  1986  11.7
##  3  1987  11.6
##  4  1988  11.5
##  5  1989  13.1
##  6  1990  13.2
##  7  1991  13.3
##  8  1992  13.5
##  9  1993  14.5
## 10  1994  15.0
## # ... with 21 more rows

Mean suicide rate in those years

mean(df1$rate)

## [1] 13.12023

cleaned_suicide_df %>% 
  group_by(age) %>% 
  summarise(suicide_per_100k = (sum(suicides_no) / sum(population)) * 10^5)

## # A tibble: 6 x 2
##   age         suicide_per_100k
##   <chr>                  <dbl>
## 1 15-24 years            9.36 
## 2 25-34 years           13.3  
## 3 35-54 years           17.1  
## 4 5-14 years             0.622
## 5 55-74 years           18.9  
## 6 75+ years             24.5

By country, population and suicide number

dataframe <- cleaned_suicide_df %>% group_by(country, population,suicides_no,year) %>%
  summarize(Rate = ((sum(suicides_no))/sum(population))*10^5)

## `summarise()` has grouped output by 'country', 'population', 'suicides_no'. You can override using the `.groups` argument.

dataframe

## # A tibble: 27,628 x 5
## # Groups:   country, population, suicides_no [27,401]
##    country population suicides_no  year  Rate
##    <chr>        <dbl>       <dbl> <dbl> <dbl>
##  1 Albania      21800           1  1987  4.59
##  2 Albania      22300           1  1988  4.48
##  3 Albania      22500           2  1989  8.89
##  4 Albania      23900           0  1992  0   
##  5 Albania      24200           1  1993  4.13
##  6 Albania      24600           2  1994  8.13
##  7 Albania      24900           1  2000  4.02
##  8 Albania      25100           1  1995  3.98
##  9 Albania      25400           2  1996  7.87
## 10 Albania      25400           3  1997 11.8 
## # ... with 27,618 more rows

Group by Country

df2<- cleaned_suicide_df %>% 
  group_by(country) %>% 
  summarize(Rate = ((sum(suicides_no))/sum(population))*10^5)

df2

## # A tibble: 100 x 2
##    country               Rate
##    <chr>                <dbl>
##  1 Albania              3.16 
##  2 Antigua and Barbuda  0.553
##  3 Argentina            7.94 
##  4 Armenia              2.45 
##  5 Aruba                8.02 
##  6 Australia           12.9  
##  7 Austria             20.7  
##  8 Azerbaijan           1.48 
##  9 Bahamas              1.42 
## 10 Bahrain              2.76 
## # ... with 90 more rows

pacman:: p_load(sf, rnaturalearth, rnaturalearthdata, rgeos)

world <- ne_countries(scale = 'medium', returnclass='sf')

df2 <-  df2 %>% arrange(country)

df2

## # A tibble: 100 x 2
##    country               Rate
##    <chr>                <dbl>
##  1 Albania              3.16 
##  2 Antigua and Barbuda  0.553
##  3 Argentina            7.94 
##  4 Armenia              2.45 
##  5 Aruba                8.02 
##  6 Australia           12.9  
##  7 Austria             20.7  
##  8 Azerbaijan           1.48 
##  9 Bahamas              1.42 
## 10 Bahrain              2.76 
## # ... with 90 more rows

ndf2 <- df2$country

nw <- world$name

tf <- nw %in% ndf2

df4 <- df2 %>% rename(name = country)

world_2 <- merge(world, df4, all.x=T)

world

## Simple feature collection with 241 features and 63 fields
## Geometry type: MULTIPOLYGON
## Dimension:     XY
## Bounding box:  xmin: -180 ymin: -89.99893 xmax: 180 ymax: 83.59961
## CRS:           +proj=longlat +datum=WGS84 +no_defs +ellps=WGS84 +towgs84=0,0,0
## First 10 features:
##   scalerank      featurecla labelrank           sovereignt sov_a3 adm0_dif
## 0         3 Admin-0 country         5          Netherlands    NL1        1
## 1         1 Admin-0 country         3          Afghanistan    AFG        0
## 2         1 Admin-0 country         3               Angola    AGO        0
## 3         1 Admin-0 country         6       United Kingdom    GB1        1
## 4         1 Admin-0 country         6              Albania    ALB        0
## 5         3 Admin-0 country         6              Finland    FI1        1
## 6         3 Admin-0 country         6              Andorra    AND        0
## 7         1 Admin-0 country         4 United Arab Emirates    ARE        0
## 8         1 Admin-0 country         2            Argentina    ARG        0
## 9         1 Admin-0 country         6              Armenia    ARM        0
##   level              type                admin adm0_a3 geou_dif
## 0     2           Country                Aruba     ABW        0
## 1     2 Sovereign country          Afghanistan     AFG        0
## 2     2 Sovereign country               Angola     AGO        0
## 3     2        Dependency             Anguilla     AIA        0
## 4     2 Sovereign country              Albania     ALB        0
## 5     2           Country                Aland     ALD        0
## 6     2 Sovereign country              Andorra     AND        0
## 7     2 Sovereign country United Arab Emirates     ARE        0
## 8     2 Sovereign country            Argentina     ARG        0
## 9     2 Sovereign country              Armenia     ARM        0
##                geounit gu_a3 su_dif              subunit su_a3 brk_diff
## 0                Aruba   ABW      0                Aruba   ABW        0
## 1          Afghanistan   AFG      0          Afghanistan   AFG        0
## 2               Angola   AGO      0               Angola   AGO        0
## 3             Anguilla   AIA      0             Anguilla   AIA        0
## 4              Albania   ALB      0              Albania   ALB        0
## 5                Aland   ALD      0                Aland   ALD        0
## 6              Andorra   AND      0              Andorra   AND        0
## 7 United Arab Emirates   ARE      0 United Arab Emirates   ARE        0
## 8            Argentina   ARG      0            Argentina   ARG        0
## 9              Armenia   ARM      0              Armenia   ARM        0
##                   name            name_long brk_a3             brk_name
## 0                Aruba                Aruba    ABW                Aruba
## 1          Afghanistan          Afghanistan    AFG          Afghanistan
## 2               Angola               Angola    AGO               Angola
## 3             Anguilla             Anguilla    AIA             Anguilla
## 4              Albania              Albania    ALB              Albania
## 5                Aland        Aland Islands    ALD                Aland
## 6              Andorra              Andorra    AND              Andorra
## 7 United Arab Emirates United Arab Emirates    ARE United Arab Emirates
## 8            Argentina            Argentina    ARG            Argentina
## 9              Armenia              Armenia    ARM              Armenia
##   brk_group abbrev postal                    formal_en formal_fr note_adm0
## 0      <NA>  Aruba     AW                        Aruba      <NA>     Neth.
## 1      <NA>   Afg.     AF Islamic State of Afghanistan      <NA>      <NA>
## 2      <NA>   Ang.     AO  People's Republic of Angola      <NA>      <NA>
## 3      <NA>   Ang.     AI                         <NA>      <NA>      U.K.
## 4      <NA>   Alb.     AL          Republic of Albania      <NA>      <NA>
## 5      <NA>  Aland     AI                Åland Islands      <NA>      Fin.
## 6      <NA>   And.    AND      Principality of Andorra      <NA>      <NA>
## 7      <NA> U.A.E.     AE         United Arab Emirates      <NA>      <NA>
## 8      <NA>   Arg.     AR           Argentine Republic      <NA>      <NA>
## 9      <NA>   Arm.    ARM          Republic of Armenia      <NA>      <NA>
##   note_brk            name_sort name_alt mapcolor7 mapcolor8 mapcolor9
## 0     <NA>                Aruba     <NA>         4         2         2
## 1     <NA>          Afghanistan     <NA>         5         6         8
## 2     <NA>               Angola     <NA>         3         2         6
## 3     <NA>             Anguilla     <NA>         6         6         6
## 4     <NA>              Albania     <NA>         1         4         1
## 5     <NA>                Aland     <NA>         4         1         4
## 6     <NA>              Andorra     <NA>         1         4         1
## 7     <NA> United Arab Emirates     <NA>         2         1         3
## 8     <NA>            Argentina     <NA>         3         1         3
## 9     <NA>              Armenia     <NA>         3         1         2
##   mapcolor13  pop_est gdp_md_est pop_year lastcensus gdp_year
## 0          9   103065     2258.0       NA       2010       NA
## 1          7 28400000    22270.0       NA       1979       NA
## 2          1 12799293   110300.0       NA       1970       NA
## 3          3    14436      108.9       NA         NA       NA
## 4          6  3639453    21810.0       NA       2001       NA
## 5          6    27153     1563.0       NA         NA       NA
## 6          8    83888     3660.0       NA       1989       NA
## 7          3  4798491   184300.0       NA       2010       NA
## 8         13 40913584   573900.0       NA       2010       NA
## 9         10  2967004    18770.0       NA       2001       NA
##                      economy              income_grp wikipedia fips_10 iso_a2
## 0       6. Developing region 2. High income: nonOECD        NA    <NA>     AW
## 1  7. Least developed region           5. Low income        NA    <NA>     AF
## 2  7. Least developed region  3. Upper middle income        NA    <NA>     AO
## 3       6. Developing region  3. Upper middle income        NA    <NA>     AI
## 4       6. Developing region  4. Lower middle income        NA    <NA>     AL
## 5 2. Developed region: nonG7    1. High income: OECD        NA    <NA>     AX
## 6 2. Developed region: nonG7 2. High income: nonOECD        NA    <NA>     AD
## 7       6. Developing region 2. High income: nonOECD        NA    <NA>     AE
## 8    5. Emerging region: G20  3. Upper middle income        NA    <NA>     AR
## 9       6. Developing region  4. Lower middle income        NA    <NA>     AM
##   iso_a3 iso_n3 un_a3 wb_a2 wb_a3 woe_id adm0_a3_is adm0_a3_us adm0_a3_un
## 0    ABW    533   533    AW   ABW     NA        ABW        ABW         NA
## 1    AFG    004   004    AF   AFG     NA        AFG        AFG         NA
## 2    AGO    024   024    AO   AGO     NA        AGO        AGO         NA
## 3    AIA    660   660  <NA>  <NA>     NA        AIA        AIA         NA
## 4    ALB    008   008    AL   ALB     NA        ALB        ALB         NA
## 5    ALA    248   248  <NA>  <NA>     NA        ALA        ALD         NA
## 6    AND    020   020    AD   ADO     NA        AND        AND         NA
## 7    ARE    784   784    AE   ARE     NA        ARE        ARE         NA
## 8    ARG    032   032    AR   ARG     NA        ARG        ARG         NA
## 9    ARM    051   051    AM   ARM     NA        ARM        ARM         NA
##   adm0_a3_wb     continent region_un       subregion                  region_wb
## 0         NA North America  Americas       Caribbean  Latin America & Caribbean
## 1         NA          Asia      Asia   Southern Asia                 South Asia
## 2         NA        Africa    Africa   Middle Africa         Sub-Saharan Africa
## 3         NA North America  Americas       Caribbean  Latin America & Caribbean
## 4         NA        Europe    Europe Southern Europe      Europe & Central Asia
## 5         NA        Europe    Europe Northern Europe      Europe & Central Asia
## 6         NA        Europe    Europe Southern Europe      Europe & Central Asia
## 7         NA          Asia      Asia    Western Asia Middle East & North Africa
## 8         NA South America  Americas   South America  Latin America & Caribbean
## 9         NA          Asia      Asia    Western Asia      Europe & Central Asia
##   name_len long_len abbrev_len tiny homepart                       geometry
## 0        5        5          5    4       NA MULTIPOLYGON (((-69.89912 1...
## 1       11       11          4   NA        1 MULTIPOLYGON (((74.89131 37...
## 2        6        6          4   NA        1 MULTIPOLYGON (((14.19082 -5...
## 3        8        8          4   NA       NA MULTIPOLYGON (((-63.00122 1...
## 4        7        7          4   NA        1 MULTIPOLYGON (((20.06396 42...
## 5        5       13          5    5       NA MULTIPOLYGON (((20.61133 60...
## 6        7        7          4    5        1 MULTIPOLYGON (((1.706055 42...
## 7       20       20          6   NA        1 MULTIPOLYGON (((53.92783 24...
## 8        9        9          4   NA        1 MULTIPOLYGON (((-64.54917 -...
## 9        7        7          4   NA        1 MULTIPOLYGON (((45.55234 40...

world_2$rate %>% length()

## [1] 0

nw[!tf]

##   [1] "Afghanistan"               "Angola"                   
##   [3] "Anguilla"                  "Aland"                    
##   [5] "Andorra"                   "American Samoa"           
##   [7] "Antarctica"                "Ashmore and Cartier Is."  
##   [9] "Fr. S. Antarctic Lands"    "Antigua and Barb."        
##  [11] "Burundi"                   "Benin"                    
##  [13] "Burkina Faso"              "Bangladesh"               
##  [15] "Bosnia and Herz."          "St-Barthélemy"            
##  [17] "Bermuda"                   "Bolivia"                  
##  [19] "Brunei"                    "Bhutan"                   
##  [21] "Botswana"                  "Central African Rep."     
##  [23] "China"                     "Côte d'Ivoire"            
##  [25] "Cameroon"                  "Dem. Rep. Congo"          
##  [27] "Congo"                     "Cook Is."                 
##  [29] "Comoros"                   "Cape Verde"               
##  [31] "Curaçao"                   "Cayman Is."               
##  [33] "N. Cyprus"                 "Czech Rep."               
##  [35] "Djibouti"                  "Dominican Rep."           
##  [37] "Algeria"                   "Egypt"                    
##  [39] "Eritrea"                   "Ethiopia"                 
##  [41] "Falkland Is."              "Faeroe Is."               
##  [43] "Micronesia"                "Gabon"                    
##  [45] "Guernsey"                  "Ghana"                    
##  [47] "Guinea"                    "Gambia"                   
##  [49] "Guinea-Bissau"             "Eq. Guinea"               
##  [51] "Greenland"                 "Guam"                     
##  [53] "Hong Kong"                 "Heard I. and McDonald Is."
##  [55] "Honduras"                  "Haiti"                    
##  [57] "Indonesia"                 "Isle of Man"              
##  [59] "India"                     "Indian Ocean Ter."        
##  [61] "Br. Indian Ocean Ter."     "Iran"                     
##  [63] "Iraq"                      "Jersey"                   
##  [65] "Jordan"                    "Siachen Glacier"          
##  [67] "Kenya"                     "Cambodia"                 
##  [69] "St. Kitts and Nevis"       "Korea"                    
##  [71] "Kosovo"                    "Lao PDR"                  
##  [73] "Lebanon"                   "Liberia"                  
##  [75] "Libya"                     "Liechtenstein"            
##  [77] "Lesotho"                   "Macao"                    
##  [79] "St-Martin"                 "Morocco"                  
##  [81] "Monaco"                    "Moldova"                  
##  [83] "Madagascar"                "Marshall Is."             
##  [85] "Macedonia"                 "Mali"                     
##  [87] "Myanmar"                   "Mongolia"                 
##  [89] "N. Mariana Is."            "Mozambique"               
##  [91] "Mauritania"                "Montserrat"               
##  [93] "Malawi"                    "Malaysia"                 
##  [95] "Namibia"                   "New Caledonia"            
##  [97] "Niger"                     "Norfolk Island"           
##  [99] "Nigeria"                   "Niue"                     
## [101] "Nepal"                     "Nauru"                    
## [103] "Pakistan"                  "Pitcairn Is."             
## [105] "Peru"                      "Palau"                    
## [107] "Papua New Guinea"          "Dem. Rep. Korea"          
## [109] "Palestine"                 "Fr. Polynesia"            
## [111] "Russia"                    "Rwanda"                   
## [113] "W. Sahara"                 "Saudi Arabia"             
## [115] "Sudan"                     "S. Sudan"                 
## [117] "Senegal"                   "S. Geo. and S. Sandw. Is."
## [119] "Saint Helena"              "Solomon Is."              
## [121] "Sierra Leone"              "Somaliland"               
## [123] "Somalia"                   "St. Pierre and Miquelon"  
## [125] "São Tomé and Principe"     "Swaziland"                
## [127] "Sint Maarten"              "Syria"                    
## [129] "Turks and Caicos Is."      "Chad"                     
## [131] "Togo"                      "Tajikistan"               
## [133] "Timor-Leste"               "Tonga"                    
## [135] "Tunisia"                   "Taiwan"                   
## [137] "Tanzania"                  "Uganda"                   
## [139] "Vatican"                   "St. Vin. and Gren."       
## [141] "Venezuela"                 "British Virgin Is."       
## [143] "U.S. Virgin Is."           "Vietnam"                  
## [145] "Vanuatu"                   "Wallis and Futuna Is."    
## [147] "Samoa"                     "Yemen"                    
## [149] "Zambia"                    "Zimbabwe"

world$name %>% length()

## [1] 241

World Map showing the suicide rate in different countries

ggplot(data = world_2)+
  geom_sf(aes(fill = Rate))+
  scale_fill_viridis_c(option = "plasma", trans='sqrt')

Figuring out the Suicide number per one hundred thosand people by Age

#By Age

age_plot <- cleaned_suicide_df %>% 
  group_by(age) %>% 
  summarize(Rate = ((sum(suicides_no))/sum(population))*10^5) %>% 
  ggplot(aes(x = age, y = Rate, fill = age)) +
  geom_bar(stat = "identity") +
  labs(title = "Global Suicides per 100k, seen by Age",
       x = "Age",
       y = "Suicide per 100k") +
  theme(legend.position = "right") + 
  scale_alpha_continuous(breaks = seq(0, 30, 1), minor_breaks = F)
  age_plot

Figuring out the Suicide number per one hundred thosand people by Generation

#By Generation

cleaned_suicide_df %>% 
  group_by(generation) %>% 
   summarize(Rate = ((sum(suicides_no))/sum(population))*10^5) %>% 
  ggplot(aes(x = generation, y = Rate, fill = generation)) +
  geom_bar(stat = "identity") +
  labs(title = "World suicides per 100k, by generation",
       x = "generation",
       y = "Suicides per 100k") + 
  theme(legend.position = "right") +
  scale_y_continuous(breaks = seq(1,25,3), minor_breaks = F)

cleaned_suicide_df$gdp <- gsub(",","",cleaned_suicide_df$gdp_per_year) %>% 
  as.numeric()
gdp <- cleaned_suicide_df %>% 
  group_by(country, year) %>% 
  summarise(occurance = n(), gdp = sum(gdp)) %>% 
  mutate(real_gdp = gdp/occurance)

## `summarise()` has grouped output by 'country'. You can override using the `.groups` argument.

gdp

## # A tibble: 2,305 x 5
## # Groups:   country [100]
##    country  year occurance         gdp   real_gdp
##    <chr>   <dbl>     <int>       <dbl>      <dbl>
##  1 Albania  1987        12 25879498800 2156624900
##  2 Albania  1988        12 25512000000 2126000000
##  3 Albania  1989        12 28021499856 2335124988
##  4 Albania  1992        12  8513431008  709452584
##  5 Albania  1993        12 14736852456 1228071038
##  6 Albania  1994        12 23828085576 1985673798
##  7 Albania  1995        12 29093988108 2424499009
##  8 Albania  1996        12 39778779504 3314898292
##  9 Albania  1997        12 28318837296 2359903108
## 10 Albania  1998        12 32485485264 2707123772
## # ... with 2,295 more rows

Producing a Treemap illustrating the Suicide rate per country where we notice that the Russian federation has the highest number of Suicides in the world

treemap(dataframe, index="country", vSize="Rate", 
                 vColor="population",type="value", 
                palette="RdYlBu")

big5 <- cleaned_suicide_df %>%
  filter(country == "Russian Federation" | country == "United States" | country == "United Kingdom" | country == "France" | country == "Germany") %>% 
  arrange(year)

# basic symbol-and-line chart, default settings
big5 %>% 
  group_by(country, year) %>% 
  summarize(Rate = ((sum(suicides_no))/sum(population))*10^5) %>% 
  ggplot(aes(x = year, y = Rate, col = country)) + 
  geom_point(alpha = 0.5) +
  geom_smooth(se = F, span = 0.2) +
  scale_x_continuous(breaks = seq(1985, 2015, 5), minor_breaks = F) +
  labs(tiltle = "United Kingdom, France, United states, Russia, and Germany",
       subtitle = "Suicides per 100k population, from 1985 to 2015",
       x = "year",
       y = "Suicides per 100k",
       col = "Country")

## `summarise()` has grouped output by 'country'. You can override using the `.groups` argument.

## `geom_smooth()` using method = 'loess' and formula 'y ~ x'

## Warning in sqrt(sum.squares/one.delta): NaNs produced

## Warning in sqrt(sum.squares/one.delta): NaNs produced

Boxplot for the different Generation in the United states

#filtering data only for the united states 
cleaned_suicide_df_US <- cleaned_suicide_df %>% 
  filter(country == "United States")

Plotting the boxplot for the US data

US_boxplot<- ggplot(data = cleaned_suicide_df_US) + 
  geom_boxplot(mapping = aes(x = factor(generation), y = suicides_no, 
                             fill = factor(generation))) +
  labs(title = "Number of Suicides by Generation Boxplots", 
       x = "Generation", y = "Number of Suicides")
US_boxplot

The Global Trend in suicide deaths in the form of a line graph

cleaned_suicide_df %>% 
  group_by(year) %>% 
  summarize(Rate_per_10k = ((sum(suicides_no))/sum(population))*10^5) %>%  
  ggplot (aes(x= year, y = Rate_per_10k)) + 
  geom_line(col = "green", size = 2) + 
  geom_point(col = "green", size = 3) + 
  geom_hline( yintercept = Worlds_rate, linetype = 2, color = "red", size = 1) +
  labs(title = "Global Suicides (per 100k)",
       subtitle = "Trend over time, 1985 - 2015",
       x = "year",
       Y = "Suicides per 100k") + 
  scale_x_continuous(breaks = seq(1985, 2015, 2)) + 
  scale_y_continuous(breaks = seq(10,20))

A Representation of the global suicides number with respect to the age brackest

cleaned_suicide_df %>% 
  group_by(age, year) %>% 
  summarise(Total_suicides = sum(suicides_no)) %>% 
  ungroup() %>% 
  ggplot(aes(x = year, y = Total_suicides)) + 
  geom_line(aes(col = age), size = 1.5) +
  labs(title = "Suicide graph based on age",
       subtitle = "From 1985 to 2015",
       x = "year",
       y = "Suicides per age bracket") +
  scale_x_continuous(breaks = seq(1985, 2015, 2)) + 
  scale_y_continuous(breaks = seq(10,20))

## `summarise()` has grouped output by 'age'. You can override using the `.groups` argument.

Ilustrating the 10 top countries with high Male suicide Trend Russia is leading the chart

cleaned_suicide_df %>% 
  filter(sex %in% "male") %>% 
  group_by(country) %>% 
  summarise(Total_suicides = sum(suicides_no)) %>% 
  top_n(10) %>% 
  ggplot(aes(x = Total_suicides, y = reorder(country, Total_suicides), fill = "male")) +
  geom_col() +
  labs(title = "Top 10 Countries with high Male Suicides trend ",
       subtitle = "From 1985 to 2015",
       X = "Cummulative Death",
       Y = "Country") +
  theme(legend.position = "right")

## Selecting by Total_suicides

cleaned_suicide_df %>% 
  filter(sex %in% "female") %>% 
  group_by(country) %>% 
  summarise(Total_suicides = sum(suicides_no)) %>% 
  top_n(10) %>% 
  ggplot(aes(x = Total_suicides, y = reorder(country, Total_suicides), fill = "female")) +
  geom_col() +
  labs(title = "Top 10 Countries with high Female Suicides trend ",
       subtitle = "From 1985 to 2015",
       X = "Cummulative Death",
       Y = "Country") +
  theme(legend.position = "right")

## Selecting by Total_suicides

General Anlysis of Global suicide Trend From 1985 to 2016.

This Suicide dataset came from the Kaggle data source and contain a list of variables some of which are country, year, sex, age, suicides_numbers, population, suicides per 100k people, country-year, HDI for year, gdp_for_year, gdp_per_capital, and the the different generations from the years 1985 to 2016. Fortunately for this student, the dataset wasn’t too messy and didn’t need much cleaning, only required the elimination or selection of certain variables such as HDI for year, country-year and suicides per 100k people which we didn’t need to use. Again, data is not available for some countries and some countries don’t have data entirely for 2016, so we removed the year 2016 entirely from the dataset. Then we still needed to find the rate of suicide per one hundred thousand people, which we did by summarizing the sums of suicides numbers divided by the sums of population multiplied by one hundred thousand. The primary reason why I chose this dataset was because I strongly believe there is a serious problem of death by suicide in the world and its not being taken seriously. I use to ask myself why would someone decide to take his or her own life, and so caught my attention to suicide rates in the world was when I saw on the news that a famous celebrity took his own life, I became curious and wanted to really know how serious this is an issue in the world today. We live in difficult times today and reports say the Covid pandemic has increased the rate of suicide, would have loved to have an up to date data on suicide numbers in the world so I could really figure out the trend today.
Suicide rates increased 33% between 1999 and 2019, with a small decline in 2019. Suicide is the 10th leading cause of death in the United States.3 It was responsible for more than 47,500 deaths in 2019, which is about one death every 11 minutes.3 The number of people who think about or attempt suicide is even higher. In 2019, 12 million American adults seriously thought about suicide, 3.5 million planned a suicide attempt, and 1.4 million attempted suicide.Suicide affects all ages and so data shows that it is the second leading cause of death for people ages 10-34, the fourth leading cause among people ages 34-54, and the fifth leading cause among people ages 45-54. Some groups have higher suicide rates than others. Suicide rates vary by race/ethnicity, age, and other factors. The highest rates are among American Indian/Alaska Native and non-Hispanic White populations. Other Americans with higher than average rates of suicide are veterans, people who live in rural areas, and workers in certain industries and occupations like mining and construction. Young people who are lesbian, gay, or bisexual have a higher rate of suicidal ideation and behavior compared to their peers who identify as straight suicidal thoughts and behaviors. There are a lot of interesting facts I found out some of which are that, globally, and for all age groups, the suicide rate is higher for men than women and is most likely to happen in countries with the most suicide numbers like Russia and the United States of America, and that the suicide rate is highest in russia than in any other country in the world. Unfortunately, there was a lot of things I would have loved to show but couldn’t do them, things such as the correlation between suicide rate and gdp per capiatal of a country, or just representing the suicide rate for a number of selected countries on a highcharter

Content sources: https://www.cdc.gov/, https://www.cdc.gov/suicide/pdf/suicideTechnicalPackage.pdf

Project2

Onya Clovis

4/15/2021