The original research topic for this homework was going to look at people who ever used any drugs and how this would have affected income. But the variable for using any drug ever (“anydrugever”) took into account marijuana use which accounted for a large portion of the population so there would not be much of difference in the analysis. Crack use has been a drug that is used by the lower classes. So this analysis will look into if people have ever used crack and how this affects their level of income. The hypothesis for this analysis is that people who have ever used crack will have a higher chance of being in the lower levels of income than those who have never used crack before.
The data used in this analysis is subset of data from the national survey of drug use and health conducted in 2015 by the Substance Abuse and Mental Health Services Administration. The variables that will be used to measure crack use is “crack_ever” and the variable used to measure income will be “personalincome.”
library(Zelig)
library(ZeligChoice)
library(faraway)
library(dplyr)
library(tidyr)
library(survival)
library(readr)
drugData <- read_csv("/Users/paulkim/Downloads/balexturner-drug-use-employment-work-absence-income-race-education/data/nsduh_workforce_adults.csv")
This R chunk is mutating the variables to put them into more clear categories. Also in order for the simulation to run, the variables in the Zelig analysis needs to be put into an order. So the code “factor”, “level” and “ordered” are used here to put the variables into ordered groups.
drug <- drugData%>%
mutate(crack_ever=factor(crack_ever),
race_str = factor(race_str, levels = c("White", "Hispanic", "Asian", "Black/African American",
"Native American/Alaskan Native", "Hawaiian/Pacific Islander", "Mixed")),
sex = ifelse(sex == 1, "Male","Female"),
sex = factor(sex, ordered = TRUE,
levels = c("Male", "Female")),
personalincome = ifelse(personalincome == 1, "<10,000",
ifelse(personalincome == 2, "10,000-19,999",
ifelse(personalincome == 3, "20,000-29,999",
ifelse(personalincome == 4, "30,000-39,999",
ifelse(personalincome == 5, "40,000-49,999",
ifelse(personalincome == 6, "50,000-74,999",
ifelse(personalincome == 7, ">75,000", NA))))))),
personalincome = factor(personalincome, ordered = TRUE,
levels = c("<10,000", "10,000-19,999", "20,000-29,999", "30,000-39,999", "40,000-49,999",
"50,000-74,999", ">75,000")),
crack_ever = ifelse(crack_ever == "true", 1, 0))
z.out <- zelig(personalincome ~ crack_ever + sex + race_str, data = drug, model = "ologit", cite =F)
summary(z.out)
## Model:
## Call:
## z5$zelig(formula = personalincome ~ crack_ever + sex + race_str,
## data = drug)
##
## Coefficients:
## Value Std. Error t value
## crack_ever -0.2017 0.05264 -3.832
## sex.L -0.3984 0.01411 -28.236
## race_strHispanic -0.8439 0.02685 -31.435
## race_strAsian 0.1634 0.05012 3.261
## race_strBlack/African American -0.8516 0.03098 -27.492
## race_strNative American/Alaskan Native -0.8208 0.08528 -9.625
## race_strHawaiian/Pacific Islander -0.9753 0.14098 -6.918
## race_strMixed -0.6709 0.05484 -12.233
##
## Intercepts:
## Value Std. Error t value
## <10,000|10,000-19,999 -1.5639 0.0167 -93.6504
## 10,000-19,999|20,000-29,999 -0.6186 0.0146 -42.4142
## 20,000-29,999|30,000-39,999 -0.0160 0.0141 -1.1322
## 30,000-39,999|40,000-49,999 0.4841 0.0144 33.6106
## 40,000-49,999|50,000-74,999 0.9750 0.0153 63.5652
## 50,000-74,999|>75,000 1.8589 0.0192 97.0621
##
## Residual Deviance: 118759.17
## AIC: 118787.17
## Next step: Use 'setx' method
The coefficients here are showing that if people ever use crack they have a -0.2017 ordered log odds chance of being in a higher income level. Also this is showing that males have a -0.3984 ordered log odds chance of being in a higher income level than compared to females. The only race that is showing better ordered log odds chance of being in a higher income level than White are Asians. Asians have a 0.1634 higher ordered log odds chance of being in a higher income level than Whites.
x.low <- setx(z.out, crack_ever = 0)
x.high <- setx(z.out, crack_ever = 1)
s.order <- sim(z.out, x = x.low, x1 = x.high)
summary(s.order)
##
## sim x :
## -----
## ev
## mean sd 50% 2.5% 97.5%
## <10,000 0.31712175 0.006616241 0.31726226 0.30428962 0.33003483
## 10,000-19,999 0.22729466 0.003828551 0.22734734 0.21967023 0.23457238
## 20,000-29,999 0.14135829 0.001520619 0.14140484 0.13839760 0.14403997
## 30,000-39,999 0.09674722 0.001546877 0.09674519 0.09380832 0.09991277
## 40,000-49,999 0.07207190 0.001791473 0.07208371 0.06873390 0.07554575
## 50,000-74,999 0.07973204 0.002887288 0.07970285 0.07430664 0.08570380
## >75,000 0.06567413 0.003874888 0.06549286 0.05859582 0.07314291
## pv
## mean sd 50% 2.5% 97.5%
## [1,] 2.971 1.936985 2 1 7
##
## sim x1 :
## -----
## ev
## mean sd 50% 2.5% 97.5%
## <10,000 0.36251812 0.013975991 0.36195062 0.33443376 0.38962367
## 10,000-19,999 0.23137196 0.003401709 0.23150276 0.22473494 0.23778989
## 20,000-29,999 0.13362403 0.002837185 0.13363011 0.12834977 0.13923056
## 30,000-39,999 0.08733736 0.002971308 0.08747671 0.08165455 0.09356768
## 40,000-49,999 0.06301846 0.002813837 0.06316762 0.05756506 0.06870314
## 50,000-74,999 0.06777509 0.003728941 0.06784728 0.06041243 0.07497383
## >75,000 0.05435498 0.004045840 0.05430574 0.04669633 0.06250898
## pv
## mean sd 50% 2.5% 97.5%
## [1,] 2.612 1.837071 2 1 7
## fd
## mean sd 50% 2.5%
## <10,000 0.045396365 0.0121150475 0.045101684 0.021240291
## 10,000-19,999 0.004077291 0.0009304976 0.004025874 0.002333805
## 20,000-29,999 -0.007734257 0.0022784076 -0.007597500 -0.012600500
## 30,000-39,999 -0.009409859 0.0024696266 -0.009374877 -0.014561079
## 40,000-49,999 -0.009053436 0.0022830850 -0.009091765 -0.013626852
## 50,000-74,999 -0.011956953 0.0029645027 -0.012058809 -0.017683398
## >75,000 -0.011319151 0.0028193198 -0.011357269 -0.016952620
## 97.5%
## <10,000 0.069794530
## 10,000-19,999 0.005968267
## 20,000-29,999 -0.003375745
## 30,000-39,999 -0.004457083
## 40,000-49,999 -0.004455775
## 50,000-74,999 -0.006050381
## >75,000 -0.005757636
The x here represents the people who have never used crack before and the x1 represents the population that has ever used crack before. For people who have never used crack before, they have a lower chance of being in the less than $10,000 range than compared to people who have sued crack before. The first difference for the less than $10,000 range is 0.045461323 which means there is a 0.045461323 for people who have ever used crack to be in the less than $10,000 range. Now looking at the higher levels of income, $50,000-$74,999 and greater than $75,000, there are negative first difference values. The negative first differences are showing that people who have used crack before have a -0.011968286 and a -0.011354167 less chance of being in the $50,000-$74,999 and greater than $75,000 level of income respectively. These results are showing that if people have ever used crack before, they have a higher chance of being the lower income range of less than $10,000 and have a lower chance of getting into a higher income range than compared to those people who have never used crack before. The hypothesis was correct for this analysis, the people who have ever used crack have a higher chance of being in a lower income level than those who have never used crack before.