The data that I have collected are from https://www.kaggle.com/omaymas/us-suicide-cases-in-2014, but represent the 2015’s death cases collected by the Centers of Disease Control and Prevention. With various observations and over 77 variables I have created a subdataset for my questioning on suicides in the US.
The fist step was to download and to view and assses whichvariables are useful for my analysis.
library(lmtest)
library(tidyverse)
library(readr)
Suicide1 <- read_csv("C:\\Users\\Cespi\\Documents\\712\\2015_data.csv")
head(Suicide1)
The variables that are represented in the original dataset were previously coded. For my examining purposes, I have first evaluated the codes I would like to use from the original dataset. After, I have mutated the dependent variable and recoded the independent variables to help better analyze the data.
race_recode_3 - race treated as an integer. where 1= White, 2= Black, and 3= Other. manner_of_death- Treated as an integer which presents all different types of deaths that occured in 2015. age_recode_12- presented as a character variable but, only single digit numbers are shown representing intervaled age groups. education_2003_revision- An numeric variable. representing 9 levels of education. Marital Status- the marital status of the deceased person. Sex- the Sex of the deceased person.
library(dplyr)
Suicidedata<- Suicide1%>%
mutate(Suicide= ifelse(manner_of_death== 2, 1, 0),
age= as.numeric(age_recode_12),
age= ifelse(age==4,"15-24",
ifelse(age==5,"25-34",
ifelse(age==6,"35-44",
ifelse(age==7,"45-54",
ifelse(age==8,"55-64",
ifelse(age==9,"65-74",NA)))))),
education=(education_2003_revision),
education= ifelse(education==1,"8th grade or less",
ifelse(education==2,"9 - 12th grade, no diploma",
ifelse(education==3,"high school graduate or GED completed",
ifelse(education==4,"some college credit, but no degree",
ifelse(education==5,"Associate degree",
ifelse(education==6,"Bachelor's degree",
ifelse(education==7,"Master's degree",
ifelse(education==8,"Doctorate or professional degree",NA)))))))),
race=(race_recode_3),
race= ifelse(race==1,"White",
ifelse(race==2,"Black",
ifelse(race==3,"Other",NA))))%>%
select( age, marital_status, sex, education, race, Suicide)%>%
filter(age>3)
head(Suicidedata)
Suicidedata2<- Suicidedata %>%
filter(!is.na(Suicide),!is.na(race),!is.na(education))
The Dependent variable for this analysis is Manner_of_death which describes the nature of all deaths that occurred in 2015. To support my observations of committed suicides I have used mutate to create a 0/1 coding. Where if the manner of death was Suicide(original ref in dataset=2) it will equal 1 and if any other manner of death will equal 0.
After, reviewing the data’s variables. I have selected 6 variables which I found to be useful to analyze Suicides. Using the coding below, I have recoded three variables to be readable to the public. Since this data was previously, sorted through the variables had multiple recodes which did not fit in my assessment of the data properly. Age_recode_12, was a recode from the previous examiner was recoded once again this time into a categorical variable. Education, was also recoded from education_2003_revision, an integer in a categorical variable, assigning different levels of education. The data previously, also treated race as an integer, to help better evvaluate the independent variable it was changed into a categorical variable naming 1- as whites, 2- Black, 3- Other. Also, at the end of recoding each variable I have used NA, in order to not lose any stranded observations.
The final list of variables used for the binary models are as follows: Suicide- Persons that committed Suicide-> 1 yes, 0 no Age- categorical- age intervals. Sex- the sex ofthe deceased person. Female or Male. Race- categorical: Whether the deceased person was White, Black, or Other. *Martial- the marital status of the deceased person.
*Education- level of educational attainment of the deceased person. Since the data does not include, income which would have been a great variable to use. Educational attainment will be used to predict the chance that the higher education level the deceased person has this includes a pay increase.
All NA outcomes were filtered.
unique(Suicidedata2$Suicide)
## [1] 0 1
unique(Suicidedata2$age)
## [1] "65-74" "35-44" "55-64" "45-54"
unique(Suicidedata2$education)
## [1] "Bachelor's degree"
## [2] "high school graduate or GED completed"
## [3] "9 - 12th grade, no diploma"
## [4] "some college credit, but no degree"
## [5] "Associate degree"
## [6] "Master's degree"
## [7] "8th grade or less"
## [8] "Doctorate or professional degree"
unique(Suicidedata2$race)
## [1] "White" "Black" "Other"
The unique function, has allowedd me to verify that my new recoding and mutation has been accepted and translated into the new dataset.
m1<- glm(Suicide ~ sex, family = binomial, data = Suicidedata2)
summary(m1)
##
## Call:
## glm(formula = Suicide ~ sex, family = binomial, data = Suicidedata2)
##
## Deviance Residuals:
## Min 1Q Median 3Q Max
## -0.2759 -0.2759 -0.2759 -0.1925 2.8278
##
## Coefficients:
## Estimate Std. Error z value Pr(>|z|)
## (Intercept) -3.97976 0.01243 -320.25 <2e-16 ***
## sexM 0.73035 0.01440 50.71 <2e-16 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## (Dispersion parameter for binomial family taken to be 1)
##
## Null deviance: 235967 on 884220 degrees of freedom
## Residual deviance: 233125 on 884219 degrees of freedom
## AIC: 233129
##
## Number of Fisher Scoring iterations: 6
For the first Model, I have created a Simple Binary Logistical regression. To examine sex of the deceased person. Model 1, being male the odds of their death being a suicide increases by .72828.The odds ratio can be calculated by exponentiating .73035 to get 2.0758 which means we expect to see about 21% increase in the odds of males death being a suicide, for a one-unit increase in deceased person being male. This statement was proven to be statistically significant by <2e-16.
m2 <- glm(Suicide ~ sex + race + education, family = binomial, data = Suicidedata2)
summary(m2)
##
## Call:
## glm(formula = Suicide ~ sex + race + education, family = binomial,
## data = Suicidedata2)
##
## Deviance Residuals:
## Min 1Q Median 3Q Max
## -0.3755 -0.2816 -0.2340 -0.1843 3.5226
##
## Coefficients:
## Estimate Std. Error z value
## (Intercept) -4.81670 0.05230 -92.104
## sexM 0.72421 0.01446 50.086
## raceOther -1.38579 0.04608 -30.073
## raceWhite 0.02518 0.03510 0.717
## education9 - 12th grade, no diploma 0.67860 0.04397 15.435
## educationAssociate degree 1.13237 0.04400 25.737
## educationBachelor's degree 1.25364 0.04201 29.844
## educationDoctorate or professional degree 1.45050 0.05542 26.170
## educationhigh school graduate or GED completed 0.85956 0.03985 21.569
## educationMaster's degree 1.20707 0.04740 25.467
## educationsome college credit, but no degree 1.10379 0.04158 26.543
## Pr(>|z|)
## (Intercept) <2e-16 ***
## sexM <2e-16 ***
## raceOther <2e-16 ***
## raceWhite 0.473
## education9 - 12th grade, no diploma <2e-16 ***
## educationAssociate degree <2e-16 ***
## educationBachelor's degree <2e-16 ***
## educationDoctorate or professional degree <2e-16 ***
## educationhigh school graduate or GED completed <2e-16 ***
## educationMaster's degree <2e-16 ***
## educationsome college credit, but no degree <2e-16 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## (Dispersion parameter for binomial family taken to be 1)
##
## Null deviance: 235967 on 884220 degrees of freedom
## Residual deviance: 227938 on 884210 degrees of freedom
## AIC: 227960
##
## Number of Fisher Scoring iterations: 7
Model 2, interprets the following: *For every one unit change in the male sex, the log odds of death being suicide of males (versus other manner of death) increases by 0.72421.
For race being other, the log odds of death being suicide for men decreases by -1.38579 versus race being black. For race being white, the log odds of death being suicide for men increases by 0.02518 versus race being black. A male having an education attainment of 9-12th grade, no diploma, versus 8th grade or less educational attainment, the log odds of death being a suicide increases by 0.67860. A male having an education attainment of an Associates, versus 8th grade or less educational attainment, the log odds of death being a suicide increases by 1.13237. Having an education attainment of Bachelor’s degree, versus 8th grade or less educational attainment, the log odds of death being a suicide increases by 1.25364. A male having an education attainment of Doctorate or professional degree, versus 8th grade or less educational attainment, the log odds of death being a suicide increases by 1.45050. A male having an education attainment of high school graduate or GED completed, versus 8th grade or less educational attainment, the log odds of death being a suicide increases by 0.85956. A male having an education attainment of Master’s degree, versus 8th grade or less educational attainment, the log odds of death being a suicide increases by 1.20707. *A male having an education attainment of some college credit, but no degree, versus 8th grade or less educational attainment, the log odds of death being a suicide increases by 1.10379.
m3<- glm(Suicide ~ sex*race + education, family = binomial, data = Suicidedata2)
summary(m3)
##
## Call:
## glm(formula = Suicide ~ sex * race + education, family = binomial,
## data = Suicidedata2)
##
## Deviance Residuals:
## Min 1Q Median 3Q Max
## -0.3757 -0.2814 -0.2345 -0.1840 3.6155
##
## Coefficients:
## Estimate Std. Error z value
## (Intercept) -4.53628 0.07047 -64.376
## sexM 0.32494 0.07312 4.444
## raceOther -1.99802 0.09048 -22.083
## raceWhite -0.25508 0.06121 -4.167
## education9 - 12th grade, no diploma 0.67976 0.04397 15.461
## educationAssociate degree 1.13575 0.04400 25.813
## educationBachelor's degree 1.25661 0.04201 29.914
## educationDoctorate or professional degree 1.45521 0.05543 26.255
## educationhigh school graduate or GED completed 0.86145 0.03985 21.615
## educationMaster's degree 1.21109 0.04740 25.550
## educationsome college credit, but no degree 1.10658 0.04159 26.610
## sexM:raceOther 0.83249 0.10544 7.895
## sexM:raceWhite 0.39585 0.07464 5.303
## Pr(>|z|)
## (Intercept) < 2e-16 ***
## sexM 8.83e-06 ***
## raceOther < 2e-16 ***
## raceWhite 3.09e-05 ***
## education9 - 12th grade, no diploma < 2e-16 ***
## educationAssociate degree < 2e-16 ***
## educationBachelor's degree < 2e-16 ***
## educationDoctorate or professional degree < 2e-16 ***
## educationhigh school graduate or GED completed < 2e-16 ***
## educationMaster's degree < 2e-16 ***
## educationsome college credit, but no degree < 2e-16 ***
## sexM:raceOther 2.90e-15 ***
## sexM:raceWhite 1.14e-07 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## (Dispersion parameter for binomial family taken to be 1)
##
## Null deviance: 235967 on 884220 degrees of freedom
## Residual deviance: 227875 on 884208 degrees of freedom
## AIC: 227901
##
## Number of Fisher Scoring iterations: 8
The first step in analyzing this model, was the statistical significance of the coefficients. The model has proven that all coefficients are statistically significant.
This model, uses the interaction of two independent variables to to determine the log odds ratio (Sex*Race) while controling education. When a males race is other and education is being controlled their odds of death being a suicide increases by .83249 compared to a male thats race is black and education level is being controlled. From this, analysis race does interact with a males death a suicide but, it can also be examined that the rise of education creates increases.
anova(m1, m2, m3, test = "Chisq")
Examining the function anova we can determine the deviances of each model to see which model fits best. This analysis, the third model fits the best since deviance represents the measure of error the lower the deviance means that the model fits best with the data.
library(texreg)
htmlreg(list(m1,m2,m3))
| Model 1 | Model 2 | Model 3 | ||
|---|---|---|---|---|
| (Intercept) | -3.98*** | -4.82*** | -4.54*** | |
| (0.01) | (0.05) | (0.07) | ||
| sexM | 0.73*** | 0.72*** | 0.32*** | |
| (0.01) | (0.01) | (0.07) | ||
| raceOther | -1.39*** | -2.00*** | ||
| (0.05) | (0.09) | |||
| raceWhite | 0.03 | -0.26*** | ||
| (0.04) | (0.06) | |||
| education9 - 12th grade, no diploma | 0.68*** | 0.68*** | ||
| (0.04) | (0.04) | |||
| educationAssociate degree | 1.13*** | 1.14*** | ||
| (0.04) | (0.04) | |||
| educationBachelor’s degree | 1.25*** | 1.26*** | ||
| (0.04) | (0.04) | |||
| educationDoctorate or professional degree | 1.45*** | 1.46*** | ||
| (0.06) | (0.06) | |||
| educationhigh school graduate or GED completed | 0.86*** | 0.86*** | ||
| (0.04) | (0.04) | |||
| educationMaster’s degree | 1.21*** | 1.21*** | ||
| (0.05) | (0.05) | |||
| educationsome college credit, but no degree | 1.10*** | 1.11*** | ||
| (0.04) | (0.04) | |||
| sexM:raceOther | 0.83*** | |||
| (0.11) | ||||
| sexM:raceWhite | 0.40*** | |||
| (0.07) | ||||
| AIC | 233128.66 | 227960.35 | 227900.57 | |
| BIC | 233152.04 | 228088.97 | 228052.58 | |
| Log Likelihood | -116562.33 | -113969.17 | -113937.29 | |
| Deviance | 233124.66 | 227938.35 | 227874.57 | |
| Num. obs. | 884221 | 884221 | 884221 | |
| p < 0.001, p < 0.01, p < 0.05 | ||||
library(visreg)
visreg(m3, "race", scale = "response")
## Conditions used in construction of plot
## sex: M
## education: high school graduate or GED completed
Plot one, shows that the white race has the highest odds of their death being a suicide by approximately .045. The race black is only approximately .01 below whites. From the three races others have a far less odds of their death being suicide at approximately .011.
visreg(m3, "sex", by = "race", scale = "response")
Plot two, divides the sexes by race. This plot can determine that when it comes to sexes men in all races compared to the females in their own race have a higher odds of death being suicide. When we compare sexes amoung different races. White men have an odds greater by .02 compared to white women. Other race men have an increased odds of .06 compared to other raced women and black males have approx. .0011 more than black females.
cross analyzing races: White males have the highest odds compared to both males and females of any race. Black women have higher odds compared to both males and females of the other race. *The other race all together have the least odds of death being suicide.
visreg(m3, "race", by = "education", scale = "response")
Plot three compares race by education: All races odds of death resulting in Suicide climb as education increases but, the other’s race has the lowest increases compared to the black and white race.