Qing Miao (s3759783)
Last updated: 01 June, 2019
See here for RPubs:
Suicide has devastating impact on families and whole communities, fortunately suicides can be prevented. This report investigates if the Australia’s suicides rates have dropped in 2015 than it was 30 years ago. A hypothesis test will be conducted on the 1985 and 2015 data to determine if this rate has reduced.
The dataset used for this report is sourced from open source website https://www.kaggle.com/russellyates88/suicide-rates-overview-1985-to-2016.
This compiled dataset was created to locate signals in relation to increasing suicide rates in various groups globally, across the socia-economic spectrum. For the sake of this investigation, data relating to other countries will be filtered out. I also preprocessed the Australian data separately into aggregation to show total trend.
suicide <- read_xls("master.xls")
suicide_oz <- suicide %>% filter(country=="Australia")
suicide_oz %>% head()The suicide data comprises of 12 variables and 360 observations. Specifically, the variables represent:
When the data is read into R, sex, age and generation are incorrectly classified as characters.
str(suicide_oz)## Classes 'tbl_df', 'tbl' and 'data.frame': 360 obs. of 12 variables:
## $ country : chr "Australia" "Australia" "Australia" "Australia" ...
## $ year : num 1985 1985 1985 1985 1985 ...
## $ sex : chr "male" "male" "male" "male" ...
## $ age : chr "75+ years" "25-34 years" "55-74 years" "15-24 years" ...
## $ suicides_no : num 67 357 282 315 411 36 109 143 69 64 ...
## $ population : num 219000 1299100 1177400 1355800 1906800 ...
## $ rates : num 30.6 27.5 23.9 23.2 21.6 ...
## $ country-year : chr "Australia1985" "Australia1985" "Australia1985" "Australia1985" ...
## $ HDI for year : num NA NA NA NA NA NA NA NA NA NA ...
## $ gdp_for_year ($) : num 1.8e+11 1.8e+11 1.8e+11 1.8e+11 1.8e+11 ...
## $ gdp_per_capita ($): num 12374 12374 12374 12374 12374 ...
## $ generation : chr "G.I. Generation" "Boomers" "G.I. Generation" "Generation X" ...
The following procedures are performed to reclass sex, age and generation into factors and specify their levels and orders if has any.
sex <- suicide_oz$sex %>% as.factor()
sex %>% head()## [1] male male male male male female
## Levels: female male
suicide_oz$age <- factor(suicide_oz$age,levels=c("5-14 years", "15-24 years","25-34 years", "35-54 years","55-74 years","75+ years"), ordered=T)
suicide_oz$age %>% head()## [1] 75+ years 25-34 years 55-74 years 15-24 years 35-54 years 75+ years
## 6 Levels: 5-14 years < 15-24 years < 25-34 years < ... < 75+ years
suicide_oz$generation <- factor(suicide_oz$generation, levels = c("G.I. Generation","Silent","Boomers","Generation Z","Millenials","Generation X"),ordered=T)
suicide_oz$generation %>% head()## [1] G.I. Generation Boomers G.I. Generation Generation X
## [5] Silent G.I. Generation
## 6 Levels: G.I. Generation < Silent < Boomers < ... < Generation X
Before further anaylsing, the data needs to be “cleaned” for missing values and outliers. As shown below, no missing value and outliers are detected.
This reports will focus on one variable: rates. To gain a visual overveiw of suicide rates across time and relative to GDP per capita, aggregation data without sex and age grouping is imported.
sum(is.na(suicide_oz$rates))## [1] 0
rates_outliers <- suicide_oz$rates %>% boxplot()gdp_outliers <- suicide_oz$`gdp_per_capita ($)` %>% boxplot()australia <- read_xls("master.xls", sheet = 2)Trendline() function allows to observe the overall trend of suicide rates of Australia from 1985 to 2015. This graph shows that this rate does not pose a clear lineaity. From 1985 to around 2005 this ratio takes on degression but slowly increased afterwards.
trendline(australia$year, australia$rates)##
## Call:
## lm(formula = y ~ x)
##
## Residuals:
## Min 1Q Median 3Q Max
## -17.9771 -10.2942 1.1903 8.5782 26.4003
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 3413.98781 503.60037 6.7792 2.311e-07 ***
## x -1.62917 0.25182 -6.4696 5.222e-07 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 12.475 on 28 degrees of freedom
## Multiple R-squared: 0.59918, Adjusted R-squared: 0.58486
## F-statistic: 41.856 on 1 and 28 DF, p-value: 5.2219e-07
##
##
## N: 30 , AIC: 240.49 , BIC: 244.69
## Residual Sum of Squares: 4357.5
The scatter plot depicts the relationship between suicide rates and GDP per capita. We want to know if there is any correlation of these 2 variables. The result also shows weak linearity, even with logarithmic transformation. However it seems suicide rates are high when GDP per capita is lower than $25,000.
plot(australia$rates ~ australia$`gdp_per_capita ($)`)plot(log(australia$rates) ~ log(australia$`gdp_per_capita ($)`))The table below summarises the descriptive statistics of suicide rates in different years.
suicide_oz %>% group_by(year) %>% summarise(Min = min(rates,na.rm = TRUE),
Q1 = quantile(rates,probs = .25,na.rm = TRUE),
Median = median(rates, na.rm = TRUE),
Q3 = quantile(rates,probs = .75,na.rm = TRUE),
Max = max(rates,na.rm = TRUE),
Mean = mean(rates, na.rm = TRUE),
SD = sd(rates, na.rm = TRUE),
n = n()) -> table1
table1 %>% head()To solve our primary problem, we will test if the suicide rates in 1985 and 2015 are different using original dataset suicide_oz. Ho assumes the mean of 1985 rates and 2015 rates are equal.
\[H_0: \mu_1=\mu_2 \]
\[H_A: \mu_1 \ne\mu_2\]
Independent t test first assumes the normality of population data. As suicide rates from original data are grouped into ages, our sample size will be the number of age and gender groups. This is very small (<30) so Q-Q plot will help insepct normality. As shown below, all data points fall inside 95% CI for the normal quantiles so we are safe to assume normaility.
suicide_1985 <- suicide_oz %>% filter(year == 1985)
suicide_1985$rates %>% qqPlot(dist="norm")## [1] 1 2
suicide_2015 <- suicide_oz %>% filter(year==2015)
suicide_2015$rates %>% qqPlot(dist="norm")## [1] 1 2
Second assumption of independent t test is population homogeneity of variance, this is inspected with Levene’s test. As p = 0.8014 > 0.05, equal variances are assumend.
suicide_new <- suicide_oz %>% filter(year==1985| year==2015)
suicide_new$year <- factor(suicide_new$year, levels=c("1985", "2015"), ordered = T)
leveneTest(suicide_new$rates~suicide_new$year)Now we can perform hypothesis test with t.test(), assume 5% confidence level.
Estimated difference between means = 13.61750-12.84833= 0.76917. The result found a statistically significant mean difference beween 1985 and 2015 suicide rates, t(df=22) = 0.178, p=0.8607, 95% CI for the difference in means [-8.214148 9.752482]. Our decision is to reject Ho.
t.test(
suicide_new$rates ~ suicide_new$year,
var.equal=T,
alternative = "two.sided"
)##
## Two Sample t-test
##
## data: suicide_new$rates by suicide_new$year
## t = 0.17757, df = 22, p-value = 0.8607
## alternative hypothesis: true difference in means is not equal to 0
## 95 percent confidence interval:
## -8.214148 9.752482
## sample estimates:
## mean in group 1985 mean in group 2015
## 13.61750 12.84833
Our investigation reports the Australia’s suicide rates has reduced in 2015 comparing to the year 1985, in other words, there is a sign of effective suicide prevention in place.
The strength of this investigation is dataset meets all the assumptions and is ideal for hypothesis testing. We use “rates” instead of the suicide number to dispel the influence from change in the population. Also the dataset is grouped by gender and age, this allows to analysis the data in different ways. A big limitation of using this dataset is unknown dada collection. It is unclear of how the information was gathered and what method adopted for sampling.
In thee future analysis, we can possibly look at the the correlation between suicide rates and suicide preventions in place, which may explain the decline in the suicide rates.
Data source:
United Nations Development Program. (2018). Human development index (HDI). Retrieved from http://hdr.undp.org/en/indicators/137506
World Bank. (2018). World development indicators: GDP (current US$) by country:1985 to 2016. Retrieved from http://databank.worldbank.org/data/source/world-development-indicators#
[Szamil]. (2017). Suicide in the Twenty-First Century [dataset]. Retrieved from https://www.kaggle.com/szamil/suicide-in-the-twenty-first-century/notebook
World Health Organization. (2018). Suicide prevention. Retrieved from http://www.who.int/mental_health/suicide-prevention/en/