Here I used the PSID variables we used in the paper. I get estimates of the effect of incarceration on mortality, where incarceration is based on the non-response variable, and examine an initial imputation model.
The variables I considered were:
- Gender (I)
- Age (I)
- Race (I)
- Income (V)
- Education (V)
- Poor health (V)
- Incarceration / non-response (V)
knitr::opts_knit$set(root.dir = 'Users/sdaza/Google Drive/01Projects/01IncarcerationHealth/')
I consider records since 1968 and respondents 18 years old or more, that is, a sample of 52791. During that period 6218 people died. The median age of death was 71.
Differences by gender. The odds ratios are lower than 1, timing is important, not just proportions!
# explore differences by gender
x <- org[, .(male = max(male, na.rm = TRUE),
prison = max(nrprison, na.rm = TRUE),
death = max(died, na.rm = TRUE),
age = max(agei, na.rm = TRUE)), pid]
# these numbers are different because of age >= 18
table(x[, .(prison, death, male)]) # very small sample sizes for women
, , male = 0
death
prison 0 1
0 23860 2974
1 53 2
, , male = 1
death
prison 0 1
0 22149 3196
1 511 46
# odds ratio male
a <- table(x[male == 1, .(prison, death)])
(a[2,2]*a[1,1]) / (a[2,1]*a[1,2]) # it's not higher than 1, timing might be more important rather than dying or not
[1] 0.6238559
# odd ratios female
b <- table(x[male == 0, .(prison, death)])
(b[2,2]*b[1,1]) / (b[2,1]*b[1,2]) # even
[1] 0.3027496
Just to get an idea of the missing data:
countmis(org)
dghealth eduic linc_adjc nrprison frace
0.377 0.133 0.079 0.070 0.001
The highest proportion of missing cases is health and education. I defined imputation multivel models, where I use both time invariant and variant variables, age and year. Just to provide an example, the income imputation model is:
\[income = \alpha + year + age + edu + prison \\
+ health + dropout + death + \delta_r + \epsilon_i \]
Death and dropout are time-invariant variables. For this exercise, I generated 10 imputations. Final versions of this exercise should use more iterations and imputations (e.g., 30 iterations, 60 imputations). Anyways mixing doesn’t look that bad. Increasing the number of imputations increases standard errors… so we have a trade-off here.


Then, I examine the distribution of the imputed variables by age and year. Weird pattern of the incarceration variable by year, still waiting the reply of the PSID staff. Health is also weird, I wouldn’t know what to do here.






Pooled Models (only non-response incarceration models)
A simple model without health:
Multiple imputation results:
MIcombine.default(models)
results se (lower upper) missInfo
prison 0.71889240 0.1923165882 0.32813481 1.10964999 54 %
male 0.45622024 0.0264458509 0.40436694 0.50807355 5 %
agec 0.07514924 0.0009010345 0.07337851 0.07691997 14 %
fraceblack 0.35121080 0.0291765885 0.29401353 0.40840808 4 %
fraceother -0.41997053 0.0752877157 -0.56753381 -0.27240726 1 %
linc_adjc -0.06148175 0.0243441412 -0.11349792 -0.00946558 81 %
eduic -0.04811699 0.0057679552 -0.05986338 -0.03637059 56 %
Now, adding health:
Multiple imputation results:
MIcombine.default(models)
results se (lower upper) missInfo
prison 0.71935329 0.191141343 0.33129507 1.107411516 53 %
male 0.46159223 0.027112318 0.40838162 0.514802831 10 %
agec 0.07175711 0.002223414 0.06693337 0.076580853 87 %
fraceblack 0.32504446 0.031778732 0.26246118 0.387627739 19 %
fraceother -0.42156553 0.080625909 -0.57998917 -0.263141900 14 %
linc_adjc -0.04623545 0.023440393 -0.09595447 0.003483568 78 %
eduic -0.03898820 0.008989499 -0.05827634 -0.019700054 83 %
dghealth 0.47175029 0.142983131 0.15191206 0.791588508 97 %
Effect is positive but rather noisy. Because of time-varying confounding I used the MSM adjustment, the effect is smaller. The gender interaction using this sample doesn’t make sense (too small sample sizes).
Multiple imputation results:
MIcombine.default(modelsMSM)
results se (lower upper) missInfo
prison 0.65600143 0.232651456 0.18347797 1.128524882 54 %
male 0.46189978 0.027769376 0.40737692 0.516422642 12 %
agec 0.07155314 0.002226179 0.06679834 0.076307949 81 %
fraceblack 0.32105243 0.033988985 0.25426872 0.387836139 14 %
fraceother -0.42195532 0.073522175 -0.56619277 -0.277717860 9 %
linc_adjc -0.04391700 0.024404307 -0.09577239 0.007938395 79 %
eduic -0.03945674 0.008840792 -0.05840131 -0.020512160 82 %
dghealth 0.45750937 0.143537325 0.13692348 0.778095264 96 %
