Suicide Rates Investigation

Suicide prevention

Yenting, Liu (s3750625)

Last updated: JUN 2nd, 2019

Introduction

This dataset is from Kaggle https://www.kaggle.com/russellyates88/suicide-rates-overview-1985-to-2016 by extracting some crucial indexes from other websites. (For example World Bank, World Health Organization, and so on. More detail in reference)
With this data, whether we can find out the reasons from different countries information is attracting everyone.
Rpubs link comes here: http://rpubs.com/yentingliu/501481

Problem Statement

Since the suicide rate has been rising rapidly all over the world, the mental health of 21st-century people is the biggest concern for everyone.
In order to figure out some signals from data, I will conduct Chi-square goodness of fit Test, ttest, and simple linear regression to find out if GDP, gender, age and so on are relative to the high suicide rate among countries.

Data Preparation

Firstly, the target variables will be suicides/100k pop (the number of suicide incidents per 100k people) and find out any relationship with gdp_per_capita (GDP per one country’s citizen), sex, and age to prevent the suicide.
Secondly, the levels of age and gender indicate in what groups they are, who suicided in specific years.

getwd()

## [1] "/Users/qmoa_liu/Downloads"

suicide <- read_csv("/Users/qmoa_liu/Downloads/master.csv")

# Factor the variables
suicide$sex=suicide$sex %>% factor(levels = c("male","female"),labels = c("male","female"))
suicide$age=suicide$age %>% factor(levels = c("15-24 years","25-34 years",
                                              "35-54 years","55-74 years","75+ years"),
                                   labels =c("15-24 years","25-34 years",
                                             "35-54 years","55-74 years","75+ years"),
                                   ordered = T )
suicide$year=suicide$year %>% factor(
  levels =c("1985","1986","1987","1988","1989","1990","1991","1992","1993","1994",
            "1995","1996","1997","1998","1999","2000","2001","2002","2003","2004",
            "2005","2006","2007","2008","2009","2010","2011","2012","2013","2014"
            ,"2015","2016"),
  labels = c("1985","1986","1987","1988","1989","1990","1991","1992","1993","1994",
             "1995","1996","1997","1998","1999","2000","2001","2002","2003","2004",
             "2005","2006","2007","2008","2009","2010","2011","2012","2013","2014",
             "2015","2016"),
  ordered = T )

# Missing value
suicide$`suicides/100k pop` %>% is.na() %>% sum()

## [1] 0

# Filter outliers
boxplot(suicide$`suicides/100k pop`)

outliers=boxplot(suicide$`suicides/100k pop`, plot=FALSE)$out
min(outliers)

## [1] 40.19

suicide_clean=suicide %>% filter(`suicides/100k pop`<min(outliers))

Decsriptive Statistics

# A quick summary and the numbers of suicide incidents in particular year.
table1=suicide_clean$`suicides/100k pop` %>% summary()
table1

##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
##   0.000   0.710   4.990   8.547  13.280  40.160

table2=suicide_clean %>% group_by(year) %>% summarise(mean(`suicides/100k pop`))
table2

Data Visualisation

First of all, it can be seen that the years of 1985 and 2009 had the lowest suicide number with 7.5 per 100k people. However, the number serged to the greatest number after 2011 with almost 10 per 100k persons. (Seeing diagram 1)
In addition, both distributions of GDP per capita and suicides per 100k people are right-skewed. (Seeing diagram 2)
It is obvious that there is no linear relationship between GDP per capita and suicides per 100k people in the diagram of GDP per capita and suicides per 100k people. (Seeing diagram 3)
Finally, by analyzing non-suicided group, we can say there is a similar age distribution among gender, with around 35% in 75 years old or older group. (Seeing diagram 5)

# The change of year (Diagram 1)
plot(table2,main = "The change of suicide number in the whole segment",
                  ylab="The mean of suicides/100k pop ",
                  legend=rownames(table2))

# Suicide vs gdp per capita (Diagram 2-1)
hist(suicide_clean$`suicides/100k pop`,
     main = "The distribution of suicides per 100k people",xlab = "Number of suicides per 100k people")

# Diagram 2-2
hist(suicide_clean$`gdp_per_capita ($)`,
     main = "The distribution of GDP per capita ($)",xlab = "Amount of GDP per capita")

# Diagram 3
plot(suicide_clean$`gdp_per_capita ($)`~ suicide_clean$`suicides/100k pop`,data = suicide_clean,main="Scatter plot between GDP per capita with suicide per 100k people",xlab = "Number of suicides per 100k people",ylab ="Amount of GDP per capita" )

# Use Box-cox to transform the dataset (Original data) (Diagram 4)
boxcox_suicide=BoxCox(suicide$`suicides/100k pop`,lambda = "auto")
boxcox_gdp_per_cap=BoxCox(suicide$`gdp_per_capita ($)`,lambda = "auto")
hist(boxcox_suicide,
     main = "The distribution of suicides per 100k people (Transformed)",xlab = "Number of suicides per 100k people")

hist(boxcox_gdp_per_cap,
     main = "The distribution of GDP per capita ($) (Transformed)",xlab = "Amount of GDP per capita")

plot(BoxCox(suicide$`suicides/100k pop`,lambda = "auto")~BoxCox(suicide$`gdp_per_capita ($)`,lambda = "auto"),data = suicide,main="Scatter plot between GDP per capita with suicide per 100k people  (Transformed)")

# Analyze who is less possible to suicide (Diagram 5)
non_suicide=suicide_clean %>% filter(suicides_no==0)
table(non_suicide$age)

## 
## 15-24 years 25-34 years 35-54 years 55-74 years   75+ years 
##         511         451         370         559         922

table(non_suicide$sex)

## 
##   male female 
##   1624   2657

table3=table(non_suicide$age,non_suicide$sex) %>% prop.table(margin = 2)
table3 %>%barplot(main = "Non-suicide group",
                  ylab="Proportion within gender",
                  ylim=c(0,.5),legend=rownames(table3),
                  beside=TRUE,
                  args.legend=c(x="top",horiz=T,title="Age"),xlab="Gender")

Hypothesis Testing for the association between gender and age for non-suicide group

In 95% confidence interval and assumption of normality due to large sample size
Ho : There is no association between gender and age for non-suicide group
HA : There is the association between gender and age for non-suicide group
In conclusion, the test is statistically significant, which implies there is the association between gender and age in the non-suicide group

# Recall the end of last page by filtering data with suicide equal to 0
chi=chisq.test(non_suicide$age,non_suicide$sex)
chi

## 
##  Pearson's Chi-squared test
## 
## data:  non_suicide$age and non_suicide$sex
## X-squared = 17.427, df = 4, p-value = 0.001596

Hypothesis Testing for the difference of suicide number in genders

In 95% confidence interval and assumption of normality due to large sample size
Ho : There is no difference in suicide number among gender
HA : There is a difference in suicide number among gender
To sum up, the test is statistically significant, which means the male is more prone to suicide than the female with the difference of just above 7.

# Is gender a factor of the commitment of suicide?
leveneTest(suicide_clean$`suicides/100k pop`~suicide_clean$sex,
                         data = suicide_clean)

t.test(suicide_clean$`suicides/100k pop`~suicide_clean$sex,data = suicide_clean,var.equal=F)

## 
##  Welch Two Sample t-test
## 
## data:  suicide_clean$`suicides/100k pop` by suicide_clean$sex
## t = 64.174, df = 17961, p-value < 2.2e-16
## alternative hypothesis: true difference in means is not equal to 0
## 95 percent confidence interval:
##  7.176116 7.628291
## sample estimates:
##   mean in group male mean in group female 
##            12.522013             5.119809

Hypothesis Testing for the correlation between suicide number with GDP per capita

In 95% confidence interval and assumption of normality due to large sample size
Ho : There is no correlation between suicide number with GDP per capita
HA : There is a correlation between suicide number with GDP per capita
In summary, the test is statistically significant, which means there is a positive relationship between suicide number with GDP per capita. However, only almost 1.1% of the variability in the number of suicide can be explained by a linear relationship with GDP per capita.

# Suicide vs gdp per cap correlation
model1 <- lm(suicide_clean$`suicides/100k pop` ~ suicide_clean$`gdp_per_capita ($)`
             , data =suicide_clean)
model1 %>% summary()

## 
## Call:
## lm(formula = suicide_clean$`suicides/100k pop` ~ suicide_clean$`gdp_per_capita ($)`, 
##     data = suicide_clean)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -14.303  -7.512  -3.526   4.714  32.404 
## 
## Coefficients:
##                                     Estimate Std. Error t value Pr(>|t|)
## (Intercept)                        7.646e+00  7.987e-02   95.74   <2e-16
## suicide_clean$`gdp_per_capita ($)` 5.268e-05  3.111e-06   16.94   <2e-16
##                                       
## (Intercept)                        ***
## suicide_clean$`gdp_per_capita ($)` ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 9.561 on 25772 degrees of freedom
## Multiple R-squared:  0.01101,    Adjusted R-squared:  0.01097 
## F-statistic: 286.8 on 1 and 25772 DF,  p-value: < 2.2e-16

Conclusion & Discussion

Summarily, for people who won’t commit suicide, those ages are dependent by genders.
The number of suicide in men is greater than that of women (per 100k people)
There is a positive relationship between suicide number with gdp per capita
However, even though we found one of the factors which may affect the number of suicide, it still can’t explain so much due to R-squared of 0.01101
Moreover, since this project is focused on the prevention of suicide, we should explore more useful factors to aim for our objective

References

Resource from https://www.kaggle.com/russellyates88/suicide-rates-overview-1985-to-2016
United Nations Development Program. (2018). Human development index (HDI). Retrieved from http://hdr.undp.org/en/indicators/137506
World Bank. (2018). World development indicators: GDP (current US$) by country:1985 to 2016. Retrieved from http://databank.worldbank.org/data/source/world-development-indicators# [Szamil]. (2017). Suicide in the Twenty-First Century [dataset]. Retrieved from https://www.kaggle.com/szamil/suicide-in-the-twenty-first-century/notebook
World Health Organization. (2018). Suicide prevention. Retrieved from http://www.who.int/mental_health/suicide-prevention/en/