I will use the Suicide Rates Overview 1985 to 2016 data from Kaggle that compares socio-economic information with suicide rates by year and country to explore the effect of gender and age on the suicide rates in these countries. This data has information of two conceptual levels, individual and country-based. Individuals with their demographic characteristics are nested within the countries. My hypothesis is that there is a correlation between sex and suicide rates. Men commit suicide at highter rates than women across all age groups.
I will work with the following packages: library(nlme), library(dplyr), library(magrittr), library(tidyr), library(haven), library(lmerTest), library(ggplot2), library(texreg).
Now, importing data for analysis:
library (readr)
master<-read_csv("C:/Users/Marcy/Documents/soc 712/master.csv")
Parsed with column specification:
cols(
country = [31mcol_character()[39m,
year = [32mcol_double()[39m,
sex = [31mcol_character()[39m,
age = [31mcol_character()[39m,
suicides_no = [32mcol_double()[39m,
population = [32mcol_double()[39m,
`suicides/100k pop` = [32mcol_double()[39m,
`country-year` = [31mcol_character()[39m,
`HDI for year` = [32mcol_double()[39m,
`gdp_for_year ($)` = [32mcol_number()[39m,
`gdp_per_capita ($)` = [32mcol_double()[39m,
generation = [31mcol_character()[39m
)
head (master)
First, I will ignore the countly level data and analyzis the data on individual level by performing complete pooling model.
cpooling <- lm(suicides_no ~ sex, data = master)
summary(cpooling)
Call:
lm(formula = suicides_no ~ sex, data = master)
Residuals:
Min 1Q Median 3Q Max
-373.0 -325.0 -111.1 -57.1 21965.0
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 112.114 7.568 14.81 <2e-16 ***
sexmale 260.920 10.703 24.38 <2e-16 ***
---
Signif. codes: 0 *** 0.001 ** 0.01 * 0.05 . 0.1 1
Residual standard error: 892.6 on 27818 degrees of freedom
Multiple R-squared: 0.02092, Adjusted R-squared: 0.02088
F-statistic: 594.3 on 1 and 27818 DF, p-value: < 2.2e-16
Based on complete pooling model shown above, sex is a sole considered factor while running this linear model. The coefficient of sex is statistically very significant. Evidently, males commit suicides at more than double rate than females, 261 to 112.
Now, I will run a no-pooling model to conduct an effect of sex on suicide rates within countries to see if there is any variance exists.
dcoef <- master %>%
group_by(`country`) %>%
do(mod = lm(suicides_no ~ sex, data =.))
coef <- dcoef %>% do(data.frame(intc = coef(.$mod)[1]))
ggplot(coef, aes(x = intc)) + geom_density()+xlab("country")
dcoef=master %>%
group_by(`country`) %>%
do(mod = lm(suicides_no ~ sex, data = .))
coef <- dcoef %>% do(data.frame(difference = coef(.$mod)[2]))
ggplot(coef, aes(x = difference)) + geom_histogram()+xlab("Difference in Female and Male Suicide Rates by Country")
*As shown above, on average, the vast majority of the countries have reported between 0 and 500 suicides in each given year. However, in a few of them, the number of suicides is significantly higher, reaching over 1000. As per difference in male and female variation by country, it is also ranging greatly.
Now, I will use a random effect model to allow for group variation within our regression model. Also, I add the interaction of sex and age as the effect of combination of age and sex, which perspectively could have a significant added effect on suicide rates.
randomeffect=lme(suicides_no ~ sex*age, data = master, random = ~1|country, method = "ML")
summary(randomeffect)
Linear mixed-effects model fit by maximum likelihood
Data: master
Random effects:
Formula: ~1 | country
(Intercept) Residual
StdDev: 522.4601 655.4487
Fixed effects: suicides_no ~ sex * age
Correlation:
(Intr) sexmal a25-3y a35-5y a5-14y a55-7y ag75+y s:25-y s:35-y s:5-1y
sexmale -0.178
age25-34 years -0.178 0.500
age35-54 years -0.178 0.500 0.500
age5-14 years -0.178 0.499 0.499 0.499
age55-74 years -0.178 0.500 0.500 0.500 0.499
age75+ years -0.178 0.500 0.500 0.500 0.499 0.500
sexmale:age25-34 years 0.126 -0.707 -0.707 -0.354 -0.353 -0.354 -0.354
sexmale:age35-54 years 0.126 -0.707 -0.354 -0.707 -0.353 -0.354 -0.354 0.500
sexmale:age5-14 years 0.126 -0.706 -0.353 -0.353 -0.707 -0.353 -0.353 0.499 0.499
sexmale:age55-74 years 0.126 -0.707 -0.354 -0.354 -0.353 -0.707 -0.354 0.500 0.500 0.499
sexmale:age75+ years 0.126 -0.707 -0.354 -0.354 -0.353 -0.354 -0.707 0.500 0.500 0.499
s:55-y
sexmale
age25-34 years
age35-54 years
age5-14 years
age55-74 years
age75+ years
sexmale:age25-34 years
sexmale:age35-54 years
sexmale:age5-14 years
sexmale:age55-74 years
sexmale:age75+ years 0.500
Standardized Within-Group Residuals:
Min Q1 Med Q3 Max
-5.24358160 -0.18933215 0.05006066 0.21000191 27.50033170
Number of Observations: 27820
Number of Groups: 101
Thus, adding effect of age groups provides notable results. Sex, age, and interaction of sex and age have sifnificant effect of suicide rates of the population. The standart deviation is 522. Males of 35-54 years of age are the most vulnerable group (as much as three times higher) for attempting suicides. However, reaching 75+ years, their risk of commiting suicide drops significantly to -107 which is below many other age groups and even below women of 75+ age whose number is 20 in this group. Overall, the lowest rates of suicide are among the age group of children and teenagers. Interestingly, young age of males between 5-14 committ suicides at the rate of -189 compared to this particupar age group of females whose number is -69. Thus, I can conclude that children are less likely to commit a suicide than older generation people, but female children are more than twice more likely to attempt suicides than male children.
Now, I will use a random slope affect model. Unlike a random intercept model, a random slope model allows each group line to have a different slope and that means that the random slope model allows the explanatory variable to have a different effect for each group.
Slope=lme(suicides_no ~ sex*age, data = master, random = ~sex|country, method = "ML")
summary(Slope)
Linear mixed-effects model fit by maximum likelihood
Data: master
Random effects:
Formula: ~sex | country
Structure: General positive-definite, Log-Cholesky parametrization
StdDev Corr
(Intercept) 227.4749 (Intr)
sexmale 608.3545 0.929
Residual 564.0490
Fixed effects: suicides_no ~ sex * age
Correlation:
(Intr) sexmal a25-3y a35-5y a5-14y a55-7y ag75+y s:25-y s:35-y s:5-1y
sexmale 0.706
age25-34 years -0.324 0.132
age35-54 years -0.324 0.132 0.500
age5-14 years -0.323 0.131 0.499 0.499
age55-74 years -0.324 0.132 0.500 0.500 0.499
age75+ years -0.324 0.132 0.500 0.500 0.499 0.500
sexmale:age25-34 years 0.229 -0.186 -0.707 -0.354 -0.353 -0.354 -0.354
sexmale:age35-54 years 0.229 -0.186 -0.354 -0.707 -0.353 -0.354 -0.354 0.500
sexmale:age5-14 years 0.228 -0.186 -0.353 -0.353 -0.707 -0.353 -0.353 0.499 0.499
sexmale:age55-74 years 0.229 -0.186 -0.354 -0.354 -0.353 -0.707 -0.354 0.500 0.500 0.499
sexmale:age75+ years 0.229 -0.186 -0.354 -0.354 -0.353 -0.354 -0.707 0.500 0.500 0.499
s:55-y
sexmale
age25-34 years
age35-54 years
age5-14 years
age55-74 years
age75+ years
sexmale:age25-34 years
sexmale:age35-54 years
sexmale:age5-14 years
sexmale:age55-74 years
sexmale:age75+ years 0.500
Standardized Within-Group Residuals:
Min Q1 Med Q3 Max
-10.03880367 -0.16086250 0.01425447 0.16910246 27.96197319
Number of Observations: 27820
Number of Groups: 101
Data results reveal that there is a significant gender and age correlation between suicide rates of population in all reported countries. There is also significant variation in the number of suicide rates in the countires ranging from less then 10 to over 1000. Across the board, men are more than twice likely to commit a suicide. Adding age as an interaction term with person’s gender plays a very important role. Men of age 35-54 are in the higherst risk group among those who are likely to committ suicide. Being a male, your chances of committing a suicide are 150.8 against being a women with 56.7. If you also consider the age, the chances of males between 34-54 years who commit suicides are alarmingly tripled.
Now, I will compare these three models to see which one fits data best.
AIC(cpooling, randomeffect, Slope)
Apparently, Randon Slope Model fits the data best as its AIC value is less than of other two models.
To conclude on my hypothesis, it is true that men commit suicide at highter rates than women, but there is a signifciant gender variation across different age groups as shown in my tables.