I obtain estimates using the NSLY79 sample because the specification of sampling weights is much easier than using the PSID: people began being observed at the same time!
Again, I am using only 10 imputations!
The variables I considered were:
I considered records since 1980 and respondents 18 years old or more, that is, a sample of 12625. During that period 661 people died. The median age of death was 41.
Just to get an idea of the missing data:
countmis(org)
clincome_adj job healthw cdeltot married rprison
0.214 0.066 0.059 0.058 0.058 0.041
cpedu cedui
0.037 0.000
The highest proportion of missing cases corresponds to income. I defined imputation multivel models, where I use both time invariant and variant variables, age and year. Just to provide an example, the income imputation model is:
\[income = \alpha + year + age + edu + parents\_edu + prison + married \\ + health + delinquency + job + dropout + death + \delta_r + \epsilon_i \]
I don’t use weights at all when defining imputation models. Death and dropout are time-invariant variables. One of the most important predictor of income missigness is incarceration! For this exercise, I generated 10 imputations. Final versions of this exercise should use more iterations and imputations (e.g., 30 iterations, 60 imputations).
# explore differences by gender
hist(org$agei, main = "Distribution of age")
x <- org[, .(male = max(male, na.rm = TRUE),
prison = max(rprison, na.rm = TRUE),
death = max(died, na.rm = TRUE), age = max(agei, na.rm = TRUE)), id]
# these numbers are different because of age >= 18 and year >= 1980
table(x[, .(prison, death, male)])
, , male = 0
death
prison 0 1
0 5936 239
1 61 17
, , male = 1
death
prison 0 1
0 5386 341
1 581 64
# odds ratio male
a <- table(x[male == 1, .(prison, death)])
(a[2,2]*a[1,1]) / (a[2,1]*a[1,2])
[1] 1.739866
# odd ratios female
b <- table(x[male == 0, .(prison, death)])
(b[2,2]*b[1,1]) / (b[2,1]*b[1,2]) # huge
[1] 6.921737
# time between first incarceration event and death by gender!
x <- org[, .(id ,rprison, died, male, agei, stop)]
a <- x[rprison == 1, .(prison = min(stop), age = min(agei)), .(id, male)]
b <- x[died == 1, .(death = min(stop)), .(id, male)]
#anyDuplicated(b)
#anyDuplicated(a)
setkey(a, id, male)
setkey(b, id, male)
c <- b[a]
# only 81 cases of people who have been incarcerated and died in the sample
length(c[!is.na(death), death - prison])
[1] 81
mean(c[!is.na(death) & male == 1, age]) # men are younger
[1] 26.6875
mean(c[!is.na(death) & male == 0, age]) # women are older
[1] 29.23529
# mean and median of year between first incarceration event and death
# men
mean(c[!is.na(death) & male == 1, death - prison])
[1] 14.15625
median(c[!is.na(death) & male == 1, death - prison])
[1] 13.5
hist(c[!is.na(death) & male == 1, death - prison], main = "Men: Distribution time between incarceration and death")
# women
mean(c[!is.na(death) & male == 0, death - prison])
[1] 13.29412
median(c[!is.na(death) & male == 0, death - prison])
[1] 12
hist(c[!is.na(death) & male == 0, death - prison], main = "Women: Distribution time between incarceration and death")
Mixing of imputations:
Gray lines represent imputed data, while red lines observed data.
Income imputations are far below the observed values, I guess mainly to the effect of prison on income missingness. Probably, it would be a good idea to use a lag version of income to improve the imputation, but it doesn’t too critical.
Incarceartion looks fine. After the 2004, NLYS recorded non-response due to incarceration, that is why there are no missing records during those years.
Distribution of sampling weights!
Model 1
Multiple imputation results:
MIcombine.default(models)
results se (lower upper) missInfo
prison 0.05844335 0.13637113 -0.20884505 0.325731758 1 %
male 0.52322639 0.08484666 0.35692992 0.689522855 0 %
magef19 -0.18304748 0.12497753 -0.42799893 0.061903976 0 %
magef20 0.14054481 0.12068848 -0.09600026 0.377089885 0 %
magef21 0.06108670 0.13177843 -0.19719429 0.319367686 0 %
magef22 0.39196654 0.11812839 0.16043914 0.623493938 0 %
magef23 0.52082845 0.21309940 0.10316129 0.938495610 0 %
raceblack 0.25216301 0.12067392 0.01564484 0.488681169 1 %
racenon-hispanic/non-black 0.25761083 0.12464618 0.01330041 0.501921249 2 %
cedui -0.03627544 0.01919190 -0.07389332 0.001342446 2 %
cpedu -0.02003647 0.01432104 -0.04813576 0.008062817 9 %
clincome_adj -0.01975688 0.01642951 -0.05199882 0.012485049 10 %
healthw 0.83907566 0.10570916 0.63188895 1.046262377 0 %
job -0.76537225 0.10104180 -0.96342877 -0.567315725 3 %
married -0.83233637 0.09413722 -1.01684304 -0.647829702 1 %
cdeltot 0.08505121 0.06561958 -0.04366846 0.213770884 8 %
Model 2: Marginal structural model
Multiple imputation results:
MIcombine.default(modelsMSM)
results se (lower upper) missInfo
prison 0.281064758 0.19620595 -0.10471927 0.666848785 16 %
magef19 -0.194962483 0.13907104 -0.46766175 0.077736781 6 %
magef20 -0.031991808 0.15602230 -0.33789005 0.273906438 5 %
magef21 0.070066585 0.14210109 -0.20847563 0.348608804 3 %
magef22 0.365676881 0.12695691 0.11683424 0.614519521 2 %
magef23 0.493597994 0.22522961 0.05213245 0.935063539 2 %
male 0.469803620 0.09460129 0.28436504 0.655242200 3 %
raceblack 0.236583066 0.13763483 -0.03329556 0.506461692 6 %
racenon-hispanic/non-black 0.304874084 0.13600363 0.03825046 0.571497703 4 %
cedui -0.004109759 0.02197381 -0.04718881 0.038969294 4 %
cpedu -0.028272743 0.01627431 -0.06020967 0.003664184 10 %
clincome_adj -0.014768386 0.01983132 -0.05377729 0.024240514 17 %
healthw 0.791785692 0.10899082 0.57813693 1.005434451 3 %
job -0.727400945 0.10834780 -0.93979810 -0.515003792 4 %
married -0.856636235 0.10068798 -1.05398434 -0.659288128 1 %
cdeltot 0.082107232 0.09021496 -0.09655979 0.260774252 29 %
I use the package survey in R to estimate standard errors. This means that I am not getting robust standard errors beside adjustments related to the sampling design.
The marginal structural models are not taking into account sampling weights when redefining inverse probability weights. MSM weights are multiplied by sampling weights. That’s it.
Model 3
models <- list()
for (i in 1:10) { # number of imputation
t <- limp[.imp == i]
# definition of survey design
ds <- svydesign(id=~cluster, weights=~wt, strata=~stratum, data = t, nest = TRUE)
models[[i]] <- svycoxph( Surv(start, stop, died) ~ prison + male + magef + race +
cedui + cpedu + clincome_adj + healthw +
job + married + cdeltot, design = ds)
}
summary(mitools::MIcombine(models))
Multiple imputation results:
MIcombine.default(models)
results se (lower upper) missInfo
prison -0.24017020 0.18957904 -0.61178584 0.131445436 3 %
male 0.50469309 0.10288427 0.30304354 0.706342644 0 %
magef19 -0.12876017 0.14189709 -0.40687336 0.149353031 0 %
magef20 0.28522062 0.13891711 0.01294807 0.557493173 0 %
magef21 0.09039084 0.17222650 -0.24716692 0.427948592 0 %
magef22 0.44253636 0.14497742 0.15838578 0.726686940 0 %
magef23 0.97435253 0.24283737 0.49839999 1.450305078 0 %
raceblack 0.31434706 0.13635055 0.04709867 0.581595460 1 %
racenon-hispanic/non-black 0.43258427 0.13962983 0.15888436 0.706284178 3 %
cedui -0.03329677 0.02339753 -0.07915964 0.012566092 3 %
cpedu -0.03639282 0.01763705 -0.07104373 -0.001741918 14 %
clincome_adj -0.02693641 0.01991067 -0.06597756 0.012104733 6 %
healthw 0.87136855 0.12841117 0.61968690 1.123050202 0 %
job -0.78845098 0.13546492 -1.05396060 -0.522941365 1 %
married -1.01004447 0.13088710 -1.26657867 -0.753510273 0 %
cdeltot 0.09568691 0.07158928 -0.04478790 0.236161728 9 %
Model 4: Marginal structural model
modelsmsm <- list()
for (i in 1:10) { # number of imputations
t <- limp[.imp == i]
w1 <- ipwtm(exposure = dropout, family = "survival",
numerator = ~ male + magef + race + cdeltot + cpedu,
denominator = ~ male + magef + race + cdeltot + cpedu +
prison + cedui + clincome_adj + healthw +
job + married,
id = id,
tstart = start, timevar = stop,
type = "first",
data = t)
w2 <- ipwtm(exposure = prison, family = "survival",
numerator = ~ male + magef + race + cdeltot + cpedu,
denominator = ~ male + magef + race + cdeltot + cpedu +
cedui + clincome_adj + healthw +
job + married, id = id,
tstart = start, timevar = stop,
type = "first",
data = t)
t[, nwt := wt * w1$ipw.weights * w2$ipw.weights]
ds <- svydesign(id=~cluster, weights=~nwt, strata=~stratum, data = t, nest = TRUE)
modelsmsm[[i]] <- svycoxph( Surv(start, stop, died) ~ prison + magef + male + race +
cedui + cpedu + clincome_adj + healthw +
job + married + cdeltot + cluster(id),
design = ds)
}
summary(mitools::MIcombine(modelsmsm))
Multiple imputation results:
MIcombine.default(modelsmsm)
results se (lower upper) missInfo
prison 0.05413964 0.21668876 -0.37203762 0.4803169045 17 %
magef19 -0.16297699 0.13701015 -0.43152060 0.1055666176 2 %
magef20 0.15793619 0.16109746 -0.15784737 0.4737197442 3 %
magef21 0.08925802 0.16793561 -0.23989361 0.4184096598 1 %
magef22 0.42247840 0.14513876 0.13801110 0.7069456960 0 %
magef23 0.96272387 0.24396024 0.48456477 1.4408829770 1 %
male 0.46807486 0.10627754 0.25976272 0.6763870024 2 %
raceblack 0.28493743 0.15606506 -0.02134900 0.5912238628 10 %
racenon-hispanic/non-black 0.48789198 0.14797457 0.19765514 0.7781288193 7 %
cedui -0.01159434 0.02329146 -0.05724939 0.0340607139 3 %
cpedu -0.03928783 0.01950280 -0.07766746 -0.0009082006 18 %
clincome_adj -0.02331191 0.02077763 -0.06407914 0.0174553199 9 %
healthw 0.83674805 0.13375607 0.57458753 1.0989085703 1 %
job -0.75341865 0.13965974 -1.02715751 -0.4796797875 2 %
married -1.04232710 0.13295871 -1.30292178 -0.7817324158 0 %
cdeltot 0.09431891 0.08288878 -0.06942651 0.2580643352 25 %
It looks samping weights are informative, changing point estimates (standard errors are big though!).