Suicide Data Cases from 2015

The data that I have collected are from https://www.kaggle.com/omaymas/us-suicide-cases-in-2014, but represent the 2015’s death cases collected by the Centers of Disease Control and Prevention. With various observations and over 77 variables I have created a subdataset for my questioning on suicides in the US.

The fist step was to download and to view and assses whichvariables are useful for my analysis.

library(lmtest)
library(tidyverse)
library(readr)
Suicide1 <- read_csv("C:\\Users\\Cespi\\Documents\\712\\2015_data.csv")
head(Suicide1)

Variables

The variables that are represented in the original dataset were previously coded. For my examining purposes, I have first evaluated the codes I would like to use from the original dataset. After, I have mutated the dependent variable and recoded the independent variables to help better analyze the data.

Orginal Variables

race_recode_3 - race treated as an integer. where 1= White, 2= Black, and 3= Other. manner_of_death- Treated as an integer which presents all different types of deaths that occured in 2015. age_recode_12- presented as a character variable but, only single digit numbers are shown representing intervaled age groups. education_2003_revision- An numeric variable. representing 9 levels of education. Marital Status- the marital status of the deceased person. Sex- the Sex of the deceased person.

library(dplyr)
Suicidedata<- Suicide1%>%
mutate(Suicide= ifelse(manner_of_death== 2, 1, 0),
            age= as.numeric(age_recode_12), 
            age= ifelse(age==4,"15-24", 
                         ifelse(age==5,"25-34", 
                         ifelse(age==6,"35-44", 
                         ifelse(age==7,"45-54",
                         ifelse(age==8,"55-64",
                         ifelse(age==9,"65-74",NA)))))),
            education=(education_2003_revision),
            education= ifelse(education==1,"8th grade or less",
                       ifelse(education==2,"9 - 12th grade, no diploma",
                        ifelse(education==3,"high school graduate or GED completed",
                        ifelse(education==4,"some college credit, but no degree",
                        ifelse(education==5,"Associate degree",
                        ifelse(education==6,"Bachelor's degree",
                        ifelse(education==7,"Master's degree",
                        ifelse(education==8,"Doctorate or professional degree",NA)))))))),
              race=(race_recode_3), 
              race= ifelse(race==1,"White",
                           ifelse(race==2,"Black",
                                  ifelse(race==3,"Other",NA))))%>%
select( age, marital_status, sex, education, race, Suicide)%>%
  filter(age>3)
  
head(Suicidedata)
Suicidedata2<- Suicidedata %>% 
  filter(!is.na(Suicide),!is.na(race),!is.na(education))

Mutating Variable

The Dependent variable for this analysis is Manner_of_death which describes the nature of all deaths that occurred in 2015. To support my observations of committed suicides I have used mutate to create a 0/1 coding. Where if the manner of death was Suicide(original ref in dataset=2) it will equal 1 and if any other manner of death will equal 0.

Recoding Variables

After, reviewing the data’s variables. I have selected 6 variables which I found to be useful to analyze Suicides. Using the coding below, I have recoded three variables to be readable to the public. Since this data was previously, sorted through the variables had multiple recodes which did not fit in my assessment of the data properly. Age_recode_12, was a recode from the previous examiner was recoded once again this time into a categorical variable. Education, was also recoded from education_2003_revision, an integer in a categorical variable, assigning different levels of education. The data previously, also treated race as an integer, to help better evvaluate the independent variable it was changed into a categorical variable naming 1- as whites, 2- Black, 3- Other. Also, at the end of recoding each variable I have used NA, in order to not lose any stranded observations.

Final Examined Variables

The final list of variables used for the binary models are as follows: Suicide- Persons that committed Suicide-> 1 yes, 0 no Age- categorical- age intervals. Sex- the sex ofthe deceased person. Female or Male. Race- categorical: Whether the deceased person was White, Black, or Other. *Martial- the marital status of the deceased person.

*Education- level of educational attainment of the deceased person. Since the data does not include, income which would have been a great variable to use. Educational attainment will be used to predict the chance that the higher education level the deceased person has this includes a pay increase.

All NA outcomes were filtered.

Unique

unique(Suicidedata2$Suicide)
## [1] 0 1
unique(Suicidedata2$age)
## [1] "65-74" "35-44" "55-64" "45-54"
unique(Suicidedata2$education)
## [1] "Bachelor's degree"                    
## [2] "high school graduate or GED completed"
## [3] "9 - 12th grade, no diploma"           
## [4] "some college credit, but no degree"   
## [5] "Associate degree"                     
## [6] "Master's degree"                      
## [7] "8th grade or less"                    
## [8] "Doctorate or professional degree"
unique(Suicidedata2$race)
## [1] "White" "Black" "Other"

The unique function, has allowedd me to verify that my new recoding and mutation has been accepted and translated into the new dataset.

Binary Logitical Regression Models

Model 1

m1<- glm(Suicide ~ sex, family = binomial, data = Suicidedata2)
summary(m1)
## 
## Call:
## glm(formula = Suicide ~ sex, family = binomial, data = Suicidedata2)
## 
## Deviance Residuals: 
##     Min       1Q   Median       3Q      Max  
## -0.2759  -0.2759  -0.2759  -0.1925   2.8278  
## 
## Coefficients:
##             Estimate Std. Error z value Pr(>|z|)    
## (Intercept) -3.97976    0.01243 -320.25   <2e-16 ***
## sexM         0.73035    0.01440   50.71   <2e-16 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## (Dispersion parameter for binomial family taken to be 1)
## 
##     Null deviance: 235967  on 884220  degrees of freedom
## Residual deviance: 233125  on 884219  degrees of freedom
## AIC: 233129
## 
## Number of Fisher Scoring iterations: 6

For the first Model, I have created a Simple Binary Logistical regression. To examine sex of the deceased person. Model 1, being male the odds of their death being a suicide increases by .72828.The odds ratio can be calculated by exponentiating .73035 to get 2.0758 which means we expect to see about 21% increase in the odds of males death being a suicide, for a one-unit increase in deceased person being male. This statement was proven to be statistically significant by <2e-16.

Model 2

m2 <- glm(Suicide ~ sex + race + education, family = binomial,  data = Suicidedata2)
summary(m2)
## 
## Call:
## glm(formula = Suicide ~ sex + race + education, family = binomial, 
##     data = Suicidedata2)
## 
## Deviance Residuals: 
##     Min       1Q   Median       3Q      Max  
## -0.3755  -0.2816  -0.2340  -0.1843   3.5226  
## 
## Coefficients:
##                                                Estimate Std. Error z value
## (Intercept)                                    -4.81670    0.05230 -92.104
## sexM                                            0.72421    0.01446  50.086
## raceOther                                      -1.38579    0.04608 -30.073
## raceWhite                                       0.02518    0.03510   0.717
## education9 - 12th grade, no diploma             0.67860    0.04397  15.435
## educationAssociate degree                       1.13237    0.04400  25.737
## educationBachelor's degree                      1.25364    0.04201  29.844
## educationDoctorate or professional degree       1.45050    0.05542  26.170
## educationhigh school graduate or GED completed  0.85956    0.03985  21.569
## educationMaster's degree                        1.20707    0.04740  25.467
## educationsome college credit, but no degree     1.10379    0.04158  26.543
##                                                Pr(>|z|)    
## (Intercept)                                      <2e-16 ***
## sexM                                             <2e-16 ***
## raceOther                                        <2e-16 ***
## raceWhite                                         0.473    
## education9 - 12th grade, no diploma              <2e-16 ***
## educationAssociate degree                        <2e-16 ***
## educationBachelor's degree                       <2e-16 ***
## educationDoctorate or professional degree        <2e-16 ***
## educationhigh school graduate or GED completed   <2e-16 ***
## educationMaster's degree                         <2e-16 ***
## educationsome college credit, but no degree      <2e-16 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## (Dispersion parameter for binomial family taken to be 1)
## 
##     Null deviance: 235967  on 884220  degrees of freedom
## Residual deviance: 227938  on 884210  degrees of freedom
## AIC: 227960
## 
## Number of Fisher Scoring iterations: 7

Model 2, interprets the following: *For every one unit change in the male sex, the log odds of death being suicide of males (versus other manner of death) increases by 0.72421.

For race being other, the log odds of death being suicide for men decreases by -1.38579 versus race being black. For race being white, the log odds of death being suicide for men increases by 0.02518 versus race being black. A male having an education attainment of 9-12th grade, no diploma, versus 8th grade or less educational attainment, the log odds of death being a suicide increases by 0.67860. A male having an education attainment of an Associates, versus 8th grade or less educational attainment, the log odds of death being a suicide increases by 1.13237. Having an education attainment of Bachelor’s degree, versus 8th grade or less educational attainment, the log odds of death being a suicide increases by 1.25364. A male having an education attainment of Doctorate or professional degree, versus 8th grade or less educational attainment, the log odds of death being a suicide increases by 1.45050. A male having an education attainment of high school graduate or GED completed, versus 8th grade or less educational attainment, the log odds of death being a suicide increases by 0.85956. A male having an education attainment of Master’s degree, versus 8th grade or less educational attainment, the log odds of death being a suicide increases by 1.20707. *A male having an education attainment of some college credit, but no degree, versus 8th grade or less educational attainment, the log odds of death being a suicide increases by 1.10379.

Model 3

m3<- glm(Suicide ~ sex*race + education, family = binomial,  data = Suicidedata2)
summary(m3)
## 
## Call:
## glm(formula = Suicide ~ sex * race + education, family = binomial, 
##     data = Suicidedata2)
## 
## Deviance Residuals: 
##     Min       1Q   Median       3Q      Max  
## -0.3757  -0.2814  -0.2345  -0.1840   3.6155  
## 
## Coefficients:
##                                                Estimate Std. Error z value
## (Intercept)                                    -4.53628    0.07047 -64.376
## sexM                                            0.32494    0.07312   4.444
## raceOther                                      -1.99802    0.09048 -22.083
## raceWhite                                      -0.25508    0.06121  -4.167
## education9 - 12th grade, no diploma             0.67976    0.04397  15.461
## educationAssociate degree                       1.13575    0.04400  25.813
## educationBachelor's degree                      1.25661    0.04201  29.914
## educationDoctorate or professional degree       1.45521    0.05543  26.255
## educationhigh school graduate or GED completed  0.86145    0.03985  21.615
## educationMaster's degree                        1.21109    0.04740  25.550
## educationsome college credit, but no degree     1.10658    0.04159  26.610
## sexM:raceOther                                  0.83249    0.10544   7.895
## sexM:raceWhite                                  0.39585    0.07464   5.303
##                                                Pr(>|z|)    
## (Intercept)                                     < 2e-16 ***
## sexM                                           8.83e-06 ***
## raceOther                                       < 2e-16 ***
## raceWhite                                      3.09e-05 ***
## education9 - 12th grade, no diploma             < 2e-16 ***
## educationAssociate degree                       < 2e-16 ***
## educationBachelor's degree                      < 2e-16 ***
## educationDoctorate or professional degree       < 2e-16 ***
## educationhigh school graduate or GED completed  < 2e-16 ***
## educationMaster's degree                        < 2e-16 ***
## educationsome college credit, but no degree     < 2e-16 ***
## sexM:raceOther                                 2.90e-15 ***
## sexM:raceWhite                                 1.14e-07 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## (Dispersion parameter for binomial family taken to be 1)
## 
##     Null deviance: 235967  on 884220  degrees of freedom
## Residual deviance: 227875  on 884208  degrees of freedom
## AIC: 227901
## 
## Number of Fisher Scoring iterations: 8

The first step in analyzing this model, was the statistical significance of the coefficients. The model has proven that all coefficients are statistically significant.

This model, uses the interaction of two independent variables to to determine the log odds ratio (Sex*Race) while controling education. When a males race is other and education is being controlled their odds of death being a suicide increases by .83249 compared to a male thats race is black and education level is being controlled. From this, analysis race does interact with a males death a suicide but, it can also be examined that the rise of education creates increases.

Likelihood ratio test

anova(m1, m2, m3, test = "Chisq")

Examining the function anova we can determine the deviances of each model to see which model fits best. This analysis, the third model fits the best since deviance represents the measure of error the lower the deviance means that the model fits best with the data.

library(texreg)
htmlreg(list(m1,m2,m3))
Statistical models
Model 1 Model 2 Model 3
(Intercept) -3.98*** -4.82*** -4.54***
(0.01) (0.05) (0.07)
sexM 0.73*** 0.72*** 0.32***
(0.01) (0.01) (0.07)
raceOther -1.39*** -2.00***
(0.05) (0.09)
raceWhite 0.03 -0.26***
(0.04) (0.06)
education9 - 12th grade, no diploma 0.68*** 0.68***
(0.04) (0.04)
educationAssociate degree 1.13*** 1.14***
(0.04) (0.04)
educationBachelor’s degree 1.25*** 1.26***
(0.04) (0.04)
educationDoctorate or professional degree 1.45*** 1.46***
(0.06) (0.06)
educationhigh school graduate or GED completed 0.86*** 0.86***
(0.04) (0.04)
educationMaster’s degree 1.21*** 1.21***
(0.05) (0.05)
educationsome college credit, but no degree 1.10*** 1.11***
(0.04) (0.04)
sexM:raceOther 0.83***
(0.11)
sexM:raceWhite 0.40***
(0.07)
AIC 233128.66 227960.35 227900.57
BIC 233152.04 228088.97 228052.58
Log Likelihood -116562.33 -113969.17 -113937.29
Deviance 233124.66 227938.35 227874.57
Num. obs. 884221 884221 884221
p < 0.001, p < 0.01, p < 0.05

Plotting

Plot 1

library(visreg)
visreg(m3, "race", scale = "response")
## Conditions used in construction of plot
## sex: M
## education: high school graduate or GED completed

Plot one, shows that the white race has the highest odds of their death being a suicide by approximately .045. The race black is only approximately .01 below whites. From the three races others have a far less odds of their death being suicide at approximately .011.

Plot2

visreg(m3, "sex", by = "race", scale = "response")

Plot two, divides the sexes by race. This plot can determine that when it comes to sexes men in all races compared to the females in their own race have a higher odds of death being suicide. When we compare sexes amoung different races. White men have an odds greater by .02 compared to white women. Other race men have an increased odds of .06 compared to other raced women and black males have approx. .0011 more than black females.

cross analyzing races: White males have the highest odds compared to both males and females of any race. Black women have higher odds compared to both males and females of the other race. *The other race all together have the least odds of death being suicide.

visreg(m3, "race", by = "education", scale = "response")

Plot three compares race by education: All races odds of death resulting in Suicide climb as education increases but, the other’s race has the lowest increases compared to the black and white race.