For this homework, I am using the dataset from Kaggle which is called Suicide Rates Overview 1985 to 2016.
I am interested in looking at if gender or age will influence suicide rates. Here I am using the data of United States from 2015 which is the lastest available data.
Here we are importing the spotify data set.
pacman::p_load(Zelig,pander,texreg,lmtest,visreg,tidyverse,shiny,readr,knitr)
master <- read_csv("Desktop/master.csv")
## Parsed with column specification:
## cols(
## country = col_character(),
## year = col_double(),
## sex = col_character(),
## age = col_character(),
## suicides_no = col_double(),
## population = col_double(),
## `suicides/100k pop` = col_double(),
## `country-year` = col_character(),
## `HDI for year` = col_double(),
## `gdp_for_year ($)` = col_number(),
## `gdp_per_capita ($)` = col_double(),
## generation = col_character()
## )
suicide<-sjlabelled::remove_all_labels(master)
#This removed haven labels, allowing me to recode my variables once removed
head(suicide)
## country year sex age suicides_no population suicides.100k.pop
## 1 Albania 1987 male 15-24 years 21 312900 6.71
## 2 Albania 1987 male 35-54 years 16 308000 5.19
## 3 Albania 1987 female 15-24 years 14 289700 4.83
## 4 Albania 1987 male 75+ years 1 21800 4.59
## 5 Albania 1987 male 25-34 years 9 274300 3.28
## 6 Albania 1987 female 75+ years 1 35600 2.81
## country.year HDI.for.year gdp_for_year.... gdp_per_capita....
## 1 Albania1987 NA 2156624900 796
## 2 Albania1987 NA 2156624900 796
## 3 Albania1987 NA 2156624900 796
## 4 Albania1987 NA 2156624900 796
## 5 Albania1987 NA 2156624900 796
## 6 Albania1987 NA 2156624900 796
## generation
## 1 Generation X
## 2 Silent
## 3 Generation X
## 4 G.I. Generation
## 5 Boomers
## 6 G.I. Generation
suicide1=select(suicide,'country', 'year', 'sex', 'suicides_no', 'generation','HDI.for.year')
head(suicide1)
## country year sex suicides_no generation HDI.for.year
## 1 Albania 1987 male 21 Generation X NA
## 2 Albania 1987 male 16 Silent NA
## 3 Albania 1987 female 14 Generation X NA
## 4 Albania 1987 male 1 G.I. Generation NA
## 5 Albania 1987 male 9 Boomers NA
## 6 Albania 1987 female 1 G.I. Generation NA
S1=filter(suicide1, country=="United States")
head(S1)
## country year sex suicides_no generation HDI.for.year
## 1 United States 1985 male 2177 G.I. Generation 0.841
## 2 United States 1985 male 5302 G.I. Generation 0.841
## 3 United States 1985 male 5134 Boomers 0.841
## 4 United States 1985 male 6053 Silent 0.841
## 5 United States 1985 male 4267 Generation X 0.841
## 6 United States 1985 female 2105 Silent 0.841
dim(S1)
## [1] 372 6
length(unique(S1$country))
## [1] 1
S2=filter(S1, year=="2015")
head(S2)
## country year sex suicides_no generation HDI.for.year
## 1 United States 2015 male 3171 Silent NA
## 2 United States 2015 male 9068 Boomers NA
## 3 United States 2015 male 11634 Generation X NA
## 4 United States 2015 male 5503 Millenials NA
## 5 United States 2015 male 4359 Millenials NA
## 6 United States 2015 female 4053 Generation X NA
Recode age and gender groups.
Variable Index:
age:
1=Generation Z
2=Millenials
3=Generation X
4=Boomers
5=G.I. Generation
6=Silent
Gender:
0 = Male
1 = Female
suicide2<-rename(S2)%>%
mutate(age=
recode(generation, 'Generation Z'=1,'Millenials'=2, 'Generation X'=3,'Boomers'=4,'Silent'=5,'G.I. Generation'=6),
gender=recode(sex, 'male'=0, 'female'=1)
)%>%
select(age, gender,suicides_no,HDI.for.year)
head(suicide2)
## age gender suicides_no HDI.for.year
## 1 5 0 3171 NA
## 2 4 0 9068 NA
## 3 3 0 11634 NA
## 4 2 0 5503 NA
## 5 2 0 4359 NA
## 6 3 1 4053 NA
Calculating the mean suicides numbers for each age group occured in 2015.
It showed that in age group 3 which is Generation X has the largest suicide cases in average.
suicide2 %>%
group_by(age) %>%
summarize(mean_suicides_no= mean(suicides_no)) %>%
kable()
| age | mean_suicides_no |
|---|---|
| 1 | 206.5 |
| 2 | 3109.5 |
| 3 | 7843.5 |
| 4 | 5970.0 |
| 5 | 1855.5 |
Calculating the mean suicides numbers based on age group and gender.
From the table below, we can see that male commited more suicide cases than females. In age group 3 which is Generation X, male group has the largest suicide cases in average.
suicide2 %>%
group_by(age, gender)%>%
summarize(mean_suicides_no= mean(suicides_no)) %>%
kable()
| age | gender | mean_suicides_no |
|---|---|---|
| 1 | 0 | 255 |
| 1 | 1 | 158 |
| 2 | 0 | 4931 |
| 2 | 1 | 1288 |
| 3 | 0 | 11634 |
| 3 | 1 | 4053 |
| 4 | 0 | 9068 |
| 4 | 1 | 2872 |
| 5 | 0 | 3171 |
| 5 | 1 | 540 |
In this table, the colunm is gender and the row shows the age group. The cell is average suicide case numbers occurred in each group.
suicide2 %>%
group_by(age, gender) %>%
summarize(mean_suicides_no= mean(suicides_no)) %>%
spread(gender, mean_suicides_no) %>%
kable()
| age | 0 | 1 |
|---|---|---|
| 1 | 255 | 158 |
| 2 | 4931 | 1288 |
| 3 | 11634 | 4053 |
| 4 | 9068 | 2872 |
| 5 | 3171 | 540 |
model1 <- lm(suicides_no ~ age, data = suicide2)
model2 <- lm(suicides_no ~ age + gender, data = suicide2)
model3 <- lm(suicides_no ~ age*gender, data = suicide2)
summary(model1)
##
## Call:
## lm(formula = suicides_no ~ age, data = suicide2)
##
## Residuals:
## Min 1Q Median 3Q Max
## -4489 -2096 -1628 1480 7848
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 1921.9 2468.4 0.779 0.454
## age 621.4 787.2 0.789 0.448
##
## Residual standard error: 3664 on 10 degrees of freedom
## Multiple R-squared: 0.05865, Adjusted R-squared: -0.03548
## F-statistic: 0.6231 on 1 and 10 DF, p-value: 0.4482
Model 1: Intercept: when age is younger, it has a log odds of 1921.9. It is statistically significant between suicide numbers and age.
summary(model2)
##
## Call:
## lm(formula = suicides_no ~ age + gender, data = suicide2)
##
## Residuals:
## Min 1Q Median 3Q Max
## -4270.8 -1217.7 106.0 897.8 5865.4
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 3904.5 2279.9 1.713 0.1209
## age 621.4 668.3 0.930 0.3767
## gender -3965.2 1795.9 -2.208 0.0546 .
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 3111 on 9 degrees of freedom
## Multiple R-squared: 0.3894, Adjusted R-squared: 0.2537
## F-statistic: 2.87 on 2 and 9 DF, p-value: 0.1086
Compared to the age variabe, the gender have a decrease in log odds by 3965.2 which means females have lower possibility to commit suicide.
summary(model3)
##
## Call:
## lm(formula = suicides_no ~ age * gender, data = suicide2)
##
## Residuals:
## Min 1Q Median 3Q Max
## -4634.6 -1234.0 -199.5 1218.8 5804.3
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 2865.8 3090.4 0.927 0.381
## age 988.0 985.5 1.002 0.345
## gender -1887.7 4370.5 -0.432 0.677
## age:gender -733.2 1393.7 -0.526 0.613
##
## Residual standard error: 3244 on 8 degrees of freedom
## Multiple R-squared: 0.4098, Adjusted R-squared: 0.1885
## F-statistic: 1.852 on 3 and 8 DF, p-value: 0.2161
Finally, in Model 3 my interaction variables are age and gender. Males have a higher log odds of suicide numbers when compared to females (1887.7). Furthermore, an increase in age by one group increases the log odds of suicide numbers by 988.0. However, as the age increases and gender becoming close to female, the log odds of suicide number decreases by 733.2.
htmlreg(list(model1,model2,model3))
| Model 1 | Model 2 | Model 3 | ||
|---|---|---|---|---|
| (Intercept) | 1921.89 | 3904.48 | 2865.75 | |
| (2468.41) | (2279.88) | (3090.39) | ||
| age | 621.36 | 621.36 | 987.97 | |
| (787.17) | (668.28) | (985.51) | ||
| gender | -3965.17 | -1887.72 | ||
| (1795.94) | (4370.47) | |||
| age:gender | -733.22 | |||
| (1393.73) | ||||
| R2 | 0.06 | 0.39 | 0.41 | |
| Adj. R2 | -0.04 | 0.25 | 0.19 | |
| Num. obs. | 12 | 12 | 12 | |
| RMSE | 3664.06 | 3110.66 | 3243.72 | |
| p < 0.001, p < 0.01, p < 0.05 | ||||
From the result, we can tell that 6% of total variation can be explained by the age variable. 39% of total variation can be explained by the gender variable. 41% of total variation can be explained by both age and gender variables.
Looking at these results, we can see that Model 3 has the biggest R^2. In addition to this, our interaction term in model 3 is statistically significant at all levels.
suicide2F <- suicide2 %>% filter(gender == 1)
suicide2M <- suicide2 %>% filter(gender == 0)
modelF <- lm(suicides_no ~ age, data = suicide2F)
modelM <- lm(suicides_no ~ age, data = suicide2M)
texreg(list(modelF, modelM, model3), caption = "", custom.model.names = c("Female", "Male", "Both"), digits = 3)
##
## \begin{table}
## \begin{center}
## \begin{tabular}{l c c c }
## \hline
## & Female & Male & Both \\
## \hline
## (Intercept) & $978.031$ & $2865.754$ & $2865.754$ \\
## & $(1530.208)$ & $(4093.829)$ & $(3090.386)$ \\
## age & $254.754$ & $987.969$ & $987.969$ \\
## & $(487.978)$ & $(1305.507)$ & $(985.513)$ \\
## gender & & & $-1887.723$ \\
## & & & $(4370.466)$ \\
## age:gender & & & $-733.215$ \\
## & & & $(1393.726)$ \\
## \hline
## R$^2$ & 0.064 & 0.125 & 0.410 \\
## Adj. R$^2$ & -0.170 & -0.093 & 0.188 \\
## Num. obs. & 6 & 6 & 12 \\
## RMSE & 1606.132 & 4296.951 & 3243.721 \\
## \hline
## \multicolumn{4}{l}{\scriptsize{$^{***}p<0.001$, $^{**}p<0.01$, $^*p<0.05$}}
## \end{tabular}
## \caption{}
## \label{table:coefficients}
## \end{center}
## \end{table}
htmlreg(list(modelF,modelM))
| Model 1 | Model 2 | ||
|---|---|---|---|
| (Intercept) | 978.03 | 2865.75 | |
| (1530.21) | (4093.83) | ||
| age | 254.75 | 987.97 | |
| (487.98) | (1305.51) | ||
| R2 | 0.06 | 0.13 | |
| Adj. R2 | -0.17 | -0.09 | |
| Num. obs. | 6 | 6 | |
| RMSE | 1606.13 | 4296.95 | |
| p < 0.001, p < 0.01, p < 0.05 | |||
From the result, we can tell that 6% of total variation can be explained by the age variable among females. 13% of total variation can be explained by the age variable among males.
Looking at these results, we can see that male model has the biggest R^2. In addition to this, our interaction term in male model is statistically significant at all levels.
library(interactions)
interact_plot(model3, pred = gender, modx = age)
Looking at this graphic, we can see that females commited a lot less suicides than maled. Older people commited more suicide than younger ones. Older males commited most of the suicide in United States duiring the year 2015.
By looking at the results above, in future study we could ask why older males commited more suicides. We may find some more issues that about mental health of older people.