Mental Health Awareness

An impact study

Syed Hassan Afsar - 3734089, Arjun Khopkar - 3729445, Siddharth Sharma - 3738019

Last updated: 28 October, 2018

Introduction

Problem Statement

Data

#Reading the data into R and checking successful import by displaying dimensions
who <- read_csv("C:/Users/Syed Hassan Afsar/Downloads/RMIT 1st Semester/Intro to Statistics/Assignment 3/who_suicide_statistics.csv")

dim(who)
## [1] 43776     6

Data Preprocessing - Steps (1/2)

#Importing reduced dataset used for analysis

who_final <- read_csv("C:/Users/Syed Hassan Afsar/Downloads/RMIT 1st Semester/Intro to Statistics/Assignment 3/Who_processed.csv")

dim(who_final)
## [1] 1566    3

Data Preprocessing - Steps (2/2)

  1. UniqueIdentifier - Combination of country,age and sex variable from the original dataset.

  2. Suicide%_2000(Expressed in numeric terms without the % sign) - The suicide rate derived as a percentage of the suicide_no/population for year 2000 for a particular age,sex and country bracket

  3. Suicide%_2014(Expressed in numeric terms without the % sign) - The suicide rate derived as a percentage of the suicide_no/population for year 2014 for a particular age,sex and country bracket.

#Re-checking missing values
any(is.na(who_final))
## [1] FALSE

Descriptive Statistics

who_final <- who_final %>% mutate(suicrate_diff = who_final$`Suicide%_2014`-who_final$`Suicide%_2000`)

table1 <- who_final %>% summarise(Mean_2000 = mean(`Suicide%_2000`),SD_2000 = sd(`Suicide%_2000`),
                                  Q1_2000 = quantile(`Suicide%_2000`,probs = .25),Q3_2000 = quantile(`Suicide%_2000`,probs = 0.75), Max_2000 = max(`Suicide%_2000`), Min_2000 = min(`Suicide%_2000`),
                                  IQR_2000=Q3_2000 - Q1_2000,Range_2000 = Max_2000 - Min_2000, n = n())

table2 <- who_final %>% summarise(Mean_2014 = mean(`Suicide%_2014`),SD_2014 = sd(`Suicide%_2014`),
                                  Q1_2014 = quantile(`Suicide%_2014`,probs = .25),Q3_2014 = quantile(`Suicide%_2014`,probs = 0.75), Max_2014 = max(`Suicide%_2014`), Min_2014 = min(`Suicide%_2014`),
                                  IQR_2014=Q3_2014 - Q1_2014,Range_2014 = Max_2014 - Min_2014, n = n())

table3 <- who_final %>% summarise(Mean_Diff = mean(`suicrate_diff`),SD_Diff = sd(`suicrate_diff`),
                                  Q1_Diff = quantile(`suicrate_diff`,probs = .25),Q3_Diff = quantile(`suicrate_diff`,probs = 0.75), Max_Diff = max(`suicrate_diff`), Min_Diff = min(`suicrate_diff`),
                                  IQR_Diff=Q3_Diff - Q1_Diff,Range_Diff = Max_Diff - Min_Diff, n = n())
knitr::kable(table1) %>% kable_styling()
Mean_2000 SD_2000 Q1_2000 Q3_2000 Max_2000 Min_2000 IQR_2000 Range_2000 n
0.0168455 0.0223623 0 0.02 0.14 0 0.02 0.14 1566
knitr::kable(table2) %>% kable_styling()
Mean_2014 SD_2014 Q1_2014 Q3_2014 Max_2014 Min_2014 IQR_2014 Range_2014 n
0.0126692 0.0159921 0 0.02 0.12 0 0.02 0.12 1566

Descriptive Statistics

knitr::kable(table3) %>% kable_styling()
Mean_Diff SD_Diff Q1_Diff Q3_Diff Max_Diff Min_Diff IQR_Diff Range_Diff n
-0.0041762 0.0110441 -0.01 0 0.04 -0.06 0.01 0.1 1566

Visualization

matplot(t(data.frame(who_final$`Suicide%_2000`,
who_final$`Suicide%_2014`)),
type="b", pch=19, col=1, lty=1, xlab= "", ylab="Suicide Rate(%)",
xaxt = "n")
axis(1, at=1:2, labels=c("2000","2014"))

Descriptive Statistics and Visualization

Outlier Detection

boxplot(who_final$suicrate_diff,col = "blue", ylab = "Difference in suicide rates(%)", main = "Outlier detection for difference in suicide rates")

- As visible in the boxplot, the difference in suicide rates does demonstrate certain values which might be considered as outliers.

Outlier Filtering

boxplot(who_final_filtered$suicrate_diff,col = "blue", ylab = "Difference in suicide rates(%)", main = "Outlier detection for difference in suicide rates")

- The exclusions dropped the number of visible outliers significantly.

Normality (1/2)

  1. Visualization : QQ - Plot
  2. Quantification : Shapiro-Wilk test
who_final_filtered$suicrate_diff %>% qqPlot(dist="norm",main = "QQ-Plot for difference in suicide rates", ylab = "Difference in suicide rates(%)")

## [1] 628 639

Normality (2/2)

shapiro.test(who_final_filtered$suicrate_diff)
## 
##  Shapiro-Wilk normality test
## 
## data:  who_final_filtered$suicrate_diff
## W = 0.71697, p-value < 2.2e-16

Hypothesis Testing

\[H_0:\mu_?? = 0\] - Alternate Hypothesis

\[H_A:\mu_?? \ne 0\]

  1. We are comparing the population average difference or change between two matched samples.

  2. The data is assumed to normal considering n > 30

Reject Ho :

If p-value < 0.05 ( significance level)

If CI of the mean difference does not capture the hypothesized mean

Otherwise, fail to reject Ho .

Hypothesis Testing Cont.

pttest <- t.test(who_final_filtered$`Suicide%_2014`,
who_final_filtered$`Suicide%_2000`,
paired = TRUE,
alternative = "two.sided",
conf.level = .95
)

pttest
## 
##  Paired t-test
## 
## data:  who_final_filtered$`Suicide%_2014` and who_final_filtered$`Suicide%_2000`
## t = -14.908, df = 1542, p-value < 2.2e-16
## alternative hypothesis: true difference in means is not equal to 0
## 95 percent confidence interval:
##  -0.004759514 -0.003652670
## sample estimates:
## mean of the differences 
##            -0.004206092

Discussion

  1. Data does not explicitly capture reasons for suicide which could be a crucial factor in this analysis.

  2. Using 2014 as a benchmark for this exercise. Mental health awareness first kicked off in latter part of the previous decade and numbers as on 2018 could be a better measure,however data constraints.

References