Description of Data

For this homework, I am using the same dataset as homework 5 from Kaggle which is called Suicide Rates Overview 1985 to 2016.

I am interested in looking at the average suicide numbers from 1985 to 2016 across through the world.

Importing Dataset

Here we are importing the suicide data set.

library(readr)
library(ggplot2)
library(ggthemes)
library(babynames)
library(Zelig)
## Loading required package: survival
## 
## Attaching package: 'Zelig'
## The following object is masked from 'package:ggplot2':
## 
##     stat
library(ggrepel)
library(HistData)
library(tidyverse)
## ── Attaching packages ─────────────────────────────────────────────── tidyverse 1.2.1 ──
## ✔ tibble  2.0.1       ✔ dplyr   0.8.0.1
## ✔ tidyr   0.8.2       ✔ stringr 1.3.1  
## ✔ purrr   0.3.0       ✔ forcats 0.3.0
## ── Conflicts ────────────────────────────────────────────────── tidyverse_conflicts() ──
## ✖ dplyr::filter() masks stats::filter()
## ✖ dplyr::lag()    masks stats::lag()
## ✖ purrr::reduce() masks Zelig::reduce()
## ✖ Zelig::stat()   masks ggplot2::stat()
library(viridis)
## Loading required package: viridisLite
options(dplyr.show_progress = FALSE)
data=read.csv("/Users/Jessica/Desktop/suicide.csv")
data$country=as.character(data$country)
sdata=data%>%
  select(country,year,sex,suicides_no)%>%
  group_by(country)%>%
  mutate(avgsuicide=mean(suicides_no))%>%
  select(country,avgsuicide,year)%>%
  unique()
g <- ggplot(sdata, mapping = aes(x = year, y = avgsuicide))
g1 <- g + geom_point()
g1
## Warning: Removed 1717 rows containing missing values (geom_point).

From the graph above, we can see that the average suicide number of each country did not change much through the years. There is one country always has the highest suicide number throughout the years.

g2 <- g + geom_smooth()
g2
## `geom_smooth()` using method = 'gam' and formula 'y ~ s(x, bs = "cs")'
## Warning: Removed 1717 rows containing non-finite values (stat_smooth).

From this graph above, we can tell that the overall average suicide numbers are decreasing from 1985 to 2016.

ggplot(data=sdata,aes(x=reorder(country,avgsuicide),y=avgsuicide,fill=avgsuicide))+geom_bar(stat = "identity")+scale_fill_viridis(name = "Average Suicide", option = "C")+coord_flip()+labs(title = 'Average Suicide',subtitle='By country',x="country",y="Average Suicide")+geom_text(aes(label=round(avgsuicide,digits=3)), color="black", size=1.25,hjust=-.3)+
  theme_minimal()
## Warning: Removed 1717 rows containing missing values (position_stack).
## Warning: Removed 1717 rows containing missing values (geom_text).

From this graph, we can tell that the highest suicide numbers are mostly among developed countries, including Japan, German, France.