Similar exercise, this time using PSID weights. This is the procedure to define individual weights.
From around 70,000 individual records, I save longitudinal weights. I define our analytical sample and consider only the first weight (i.e., the one at the start of the period of observation for individual \(i\)). If the individual \(i\) doesn’t have any weight, I get weights from member of the family unit \(u\) at time \(t\), compute the average and use it for individual \(i\). Using this procedure I only lost 400 individuals.
It’s important to note that non-sample members (that is, all of the ones don’t have sampling weights) don’t have probability of selection.
I get estimates of the effect of incarceration on mortality, where incarceration is based on the non-response variable, and examine an initial imputation model.
The variables I consider are:
- Gender (I)
- Age (I)
- Race (I)
- Income (V)
- Education (V)
- Poor health (V)
- Incarceration / non-response (V)
knitr::opts_knit$set(root.dir = 'Users/sdaza/Google Drive/01Projects/01IncarcerationHealth/')
I consider records since 1968 and respondents 18 years old or more, that is, a sample of 53764. During that period 6584 people died. The median age of death was 71.
Differences by gender. The odds ratios are lower than 1, timing is important, not just proportions!
# explore differences by gender
x <- org[, .(male = max(male, na.rm = TRUE),
prison = max(nrprison, na.rm = TRUE),
death = max(died, na.rm = TRUE),
age = max(agei, na.rm = TRUE)), pid]
# these numbers are different because of age >= 18
table(x[, .(prison, death, male)]) # very small sample sizes for women
, , male = 0
death
prison 0 1
0 24130 3157
1 62 2
, , male = 1
death
prison 0 1
0 22443 3375
1 545 50
# odds ratio male
a <- table(x[male == 1, .(prison, death)])
(a[2,2]*a[1,1]) / (a[2,1]*a[1,2]) # it's not higher than 1, timing might be more important rather than dying or not
[1] 0.6100714
# odd ratios female
b <- table(x[male == 0, .(prison, death)])
(b[2,2]*b[1,1]) / (b[2,1]*b[1,2]) # even
[1] 0.2465591
Just to get an idea of the missing data:
countmis(org)
dghealth eduic linc_adjc nrprison frace
0.369 0.131 0.080 0.072 0.001
The highest proportion of missing cases is health and education. I defined imputation multivel models, where I use both time invariant and variant variables, age and year. Just to provide an example, the income imputation model is:
\[income = \alpha + year + age + edu + prison \\
+ health + dropout + death + \delta_r + \epsilon_i \]
Death and dropout are time-invariant variables. For this exercise, I generated 20 imputations. Final versions of this exercise should use more iterations and imputations (e.g., 30 iterations, 60 imputations). Anyways mixing doesn’t look that bad. Increasing the number of imputations increases standard errors… so we have a trade-off here.


Then, I examine the distribution of the imputed variables by age and year. Weird pattern of the incarceration variable by year, still waiting the reply of the PSID staff. Health is also weird, I wouldn’t know what to do here.






Without Sampling Weights
Not including the health covariate.
Model 1: Standard Model
Multiple imputation results:
MIcombine.default(models)
results se (lower upper) missInfo
prison 0.60915266 0.3145192800 -0.07236020 1.29066552 86 %
male 0.45427243 0.0257177752 0.40384519 0.50469968 6 %
agec 0.07523002 0.0009869461 0.07326874 0.07719130 33 %
fraceblack 0.34387285 0.0292159606 0.28653990 0.40120579 10 %
fraceother -0.42188168 0.0717197482 -0.56244995 -0.28131340 0 %
linc_adjc -0.06616004 0.0241303745 -0.11791655 -0.01440353 83 %
eduic -0.04884552 0.0043554316 -0.05744057 -0.04025047 23 %
Model 2: Marginal Structural Model
Multiple imputation results:
MIcombine.default(modelsMSM)
results se (lower upper) missInfo
prison 0.57493475 0.398537745 -0.27904705 1.42891656 82 %
male 0.45366990 0.026405398 0.40190149 0.50543831 5 %
agec 0.07490819 0.001186132 0.07255196 0.07726443 33 %
fraceblack 0.33735188 0.033154171 0.27223666 0.40246709 13 %
fraceother -0.41658902 0.069003829 -0.55183745 -0.28134060 1 %
linc_adjc -0.06349345 0.023671520 -0.11415777 -0.01282914 82 %
eduic -0.04900787 0.004506372 -0.05791635 -0.04009940 26 %
With Sampling Weights
Model 3: Standard model
Multiple imputation results:
MIcombine.default(models)
results se (lower upper) missInfo
prison 0.76868946 0.459993636 -0.22370545 1.761084377 85 %
male 0.45815774 0.032000960 0.39541498 0.520900504 5 %
agec 0.08265764 0.001593325 0.07951515 0.085800130 22 %
fraceblack 0.24856883 0.062828651 0.12537803 0.371759637 5 %
fraceother -0.51449164 0.088080277 -0.68712815 -0.341855122 1 %
linc_adjc -0.07659407 0.036744885 -0.15676665 0.003578505 89 %
eduic -0.05462636 0.005489502 -0.06546461 -0.043788116 24 %
Model 4: Marginal structural model
Multiple imputation results:
MIcombine.default(modelsMSM)
results se (lower upper) missInfo
prison 0.74966358 0.529419122 -0.39304630 1.892373454 85 %
male 0.45729601 0.032235699 0.39409689 0.520495134 5 %
agec 0.08237502 0.001646513 0.07912000 0.085630040 26 %
fraceblack 0.23917647 0.064957423 0.11169873 0.366654215 10 %
fraceother -0.50479494 0.089058295 -0.67936840 -0.330221470 3 %
linc_adjc -0.07317306 0.036410015 -0.15250006 0.006153939 88 %
eduic -0.05479641 0.005566744 -0.06580398 -0.043788840 27 %
