The data set I’ve chosen is the PSID from the University of Michigan with 5 waves (waves are two years apart) from 2005 through 2013. The event I’ve focused on is the transition from good to poor health, with the main discriminant being the process of moving and whether or not the move was voluntary or if it was involuntary (i.e. being displaced).
The main predictor is:
move.y: a categorical variable that can take one of three outcomes: 1) not move 2) not displaced 3) displaced
Demographic characteristics:
race: categorical: nhw, nhb, nha, nhother and hispanic
age: age squared for curvilinear effect off age
sex: dichotomous 1 is female and 0 is male
educ: dichotomous either less than high school (lths) or high school + (hs+)
The predictors were defined above in A).
The episode is whether or not someone moved in the two year period between the waves, and it has one of three possible outcomes: 1) not move 2) move but not displaced 3) move and displaced. The event is transitiong from good to poor health. The duration is two years per wave for five waves from 2005 through 2013. the censored events are those who never make the transition from good to poor health.
## C) Carry out the following analysis
### a. Fit the Cox model the data.
### i. Include all main effects in the model
#I did not include the 300 lines of code for the cleaning and conditioning of the data. Datcomb is the main data that contains all of five waves
datcomb <- datcomb %>%
mutate(
race = fct_relevel(race, 'nhw','nhb','hisp','asn','nh-other'),
move.y = as.character(ifelse(move == 'not move', 'not move',
ifelse(move.why == 'not displaced', 'not displaced', 'displaced'))),
educ = as.character(ifelse(edu <=11, 'lths',
ifelse(edu>=12, 'hs+',NA))),
time_start = ifelse(year==2005,0,
ifelse(year==2007,1,
ifelse(year==2009, 2,
ifelse(year==2011, 3,4)))),
time = time_start +1,
healthd = ifelse(health == 'good', 0,1),
)%>%
arrange(id)
#str(datcomb)
datcomb <- datcomb %>%
mutate(
educ = fct_relevel(educ, 'hs+','lths'),
move.y = fct_relevel(move.y, 'not move','not displaced','displaced'))
datcomb <- datcomb[complete.cases(datcomb),]
#table(is.na(datcomb) )
#str(datcomb)
options(survey.lonely.psu = "adjust")
des = svydesign(ids = ~1,
strata = datcomb$strat,
weights = datcomb$wt,
data = datcomb)
#length(datcomb$strat)
#length(datcomb$wt)
#must get rid of zero weights
datcomb$wt=datcomb$wt+0.000001
datcomb$age2 <- datcomb$age^2
#I first used sex and age, but had to remove them for the discrete tie=me models as I could not get them to run with thos variables in there
fit1 <- coxreg(Surv(time = time_start, time2 = time, event = healtran)~ scale(age) + scale(age2) + move.y + educ + race + sex, data = datcomb, weights = wt)
summary(fit1)
## Call:
## coxreg(formula = Surv(time = time_start, time2 = time, event = healtran) ~
## scale(age) + scale(age2) + move.y + educ + race + sex, data = datcomb,
## weights = wt)
##
## Covariate Mean Coef Rel.Risk S.E. Wald p
## scale(age) -0.000 -0.675 0.509 0.015 0.000
## scale(age2) 0.000 0.804 2.235 0.014 0.000
## move.y
## not move 0.677 0 1 (reference)
## not displaced 0.276 0.013 1.013 0.010 0.181
## displaced 0.047 0.565 1.759 0.014 0.000
## educ
## hs+ 0.854 0 1 (reference)
## lths 0.146 0.726 2.067 0.010 0.000
## race
## nhw 0.497 0 1 (reference)
## nhb 0.273 0.514 1.672 0.011 0.000
## hisp 0.084 0.564 1.758 0.012 0.000
## asn 0.106 0.044 1.045 0.020 0.029
## nh-other 0.039 0.145 1.156 0.023 0.000
## sex 0.222 0.299 1.349 0.009 0.000
##
## Events 3993
## Total time at risk 60255
## Max. log. likelihood -868877
## LR test statistic 24327.32
## Degrees of freedom 10
## Overall p-value 0
#table(datcomb$move.y)
For the cox model we can see that all predictors are significant with the exception of moving and not being displaced. From this model it appears that all predictors are harmful, to one degree or another, to our reference group: nhw, male, high school grad. Also of not, the category of moving but not being displaced was not significant. As we shall see, this model is not necessarily the best fit for this data. ### ii. Test for an interaction between at least two of the predictors
fit1.int <- coxreg(Surv(time = time_start, time2 = time, event = healtran)~ scale(age) + scale(age2) + move.y + educ + race + sex + move.y*race + move.y*educ, data = datcomb, weights = wt)
summary(fit1.int)
## Call:
## coxreg(formula = Surv(time = time_start, time2 = time, event = healtran) ~
## scale(age) + scale(age2) + move.y + educ + race + sex + move.y *
## race + move.y * educ, data = datcomb, weights = wt)
##
## Covariate Mean Coef Rel.Risk S.E. Wald p
## scale(age) -0.000 -0.677 0.508 0.015 0.000
## scale(age2) 0.000 0.806 2.240 0.014 0.000
## move.y
## not move 0.677 0 1 (reference)
## not displaced 0.276 -0.108 0.897 0.015 0.000
## displaced 0.047 0.753 2.124 0.020 0.000
## educ
## hs+ 0.854 0 1 (reference)
## lths 0.146 0.661 1.936 0.012 0.000
## race
## nhw 0.497 0 1 (reference)
## nhb 0.273 0.600 1.822 0.014 0.000
## hisp 0.084 0.657 1.928 0.013 0.000
## asn 0.106 -0.090 0.914 0.022 0.000
## nh-other 0.039 0.041 1.042 0.026 0.114
## sex 0.222 0.293 1.340 0.009 0.000
## move.y:race
## not displaced:nhb -0.013 0.987 0.026 0.604
## displaced:nhb -0.822 0.439 0.041 0.000
## not displaced:hisp -0.181 0.835 0.029 0.000
## displaced:hisp -0.667 0.513 0.045 0.000
## not displaced:asn 0.426 1.532 0.030 0.000
## displaced:asn 0.520 1.683 0.042 0.000
## not displaced:nh-other 0.534 1.706 0.049 0.000
## displaced:nh-other 0.314 1.369 0.064 0.000
## move.y:educ
## not displaced:lths 0.320 1.377 0.023 0.000
## displaced:lths -0.015 0.985 0.032 0.648
##
## Events 3993
## Total time at risk 60255
## Max. log. likelihood -868151
## LR test statistic 25779.72
## Degrees of freedom 20
## Overall p-value 0
Having added the interactions between moving and both race/ethnicity and education we now see that the process of moving but not being displaced becomes protective, while being displaced is still harmful, where one is more than 2.2 times more likely to trasnition from good to poor health.
In terms of significance, one’s race is only significant, and harmful, for non-Hispanic black and Hispanic individuals while sex shows that females are more likely to trastion from good to poor health.
The interactions showed several that were statistically significant. I will get into a detailed discusssion below.
lrt1 <- lrtest(fit1,fit1.int)
lrt1
## Likelihood ratio test
##
## Model 1: Surv(time = time_start, time2 = time, event = healtran) ~ scale(age) +
## scale(age2) + move.y + educ + race + sex
## Model 2: Surv(time = time_start, time2 = time, event = healtran) ~ scale(age) +
## scale(age2) + move.y + educ + race + sex + move.y * race +
## move.y * educ
## #Df LogLik Df Chisq Pr(>Chisq)
## 1 10 -868877
## 2 20 -868151 10 1452.4 < 2.2e-16 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
Here we can see the second model, the one with the intercations is a significantly better fitting model
#schoenfeld residuals
des2 <- svydesign(ids = ~psu, strata = ~strat, weights = datcomb$wt, data = datcomb, nest = T)
#fit2.base <- svycoxph(Surv(time = time, event = healtran)~ move.y, data = datcomb, design = des)
#fit2.race <- svycoxph(Surv(time = time, event = healtran)~ move.y + race, data = datcomb, design = des)
fit2 <- svycoxph(Surv(time = time, event = healtran)~ move.y + race + sex +educ, data = datcomb, design = des2)
#summary(fit2.base)
#summary(fit2.race)
summary(fit2)
## Stratified 1 - level Cluster Sampling design (with replacement)
## With (8833) clusters.
## svydesign(ids = ~psu, strata = ~strat, weights = datcomb$wt,
## data = datcomb, nest = T)
## Call:
## svycoxph(formula = Surv(time = time, event = healtran) ~ move.y +
## race + sex + educ, design = des2, data = datcomb)
##
## n= 60255, number of events= 3993
##
## coef exp(coef) se(coef) z Pr(>|z|)
## move.ynot displaced -0.02405 0.97624 0.06919 -0.348 0.728
## move.ydisplaced 0.58855 1.80137 0.09825 5.990 2.09e-09 ***
## racenhb 0.13523 1.14480 0.09400 1.439 0.150
## racehisp -0.16369 0.84901 0.11084 -1.477 0.140
## raceasn -1.56089 0.20995 0.08812 -17.713 < 2e-16 ***
## racenh-other -1.31145 0.26943 0.13371 -9.808 < 2e-16 ***
## sex 0.42115 1.52371 0.06063 6.947 3.74e-12 ***
## educlths 0.89654 2.45111 0.06098 14.701 < 2e-16 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## exp(coef) exp(-coef) lower .95 upper .95
## move.ynot displaced 0.9762 1.0243 0.8524 1.1180
## move.ydisplaced 1.8014 0.5551 1.4858 2.1839
## racenhb 1.1448 0.8735 0.9522 1.3764
## racehisp 0.8490 1.1778 0.6832 1.0550
## raceasn 0.2099 4.7630 0.1766 0.2495
## racenh-other 0.2694 3.7116 0.2073 0.3502
## sex 1.5237 0.6563 1.3530 1.7160
## educlths 2.4511 0.4080 2.1750 2.7623
##
## Concordance= 0.706 (se = 0.006 )
## Likelihood ratio test= NA on 8 df, p=NA
## Wald test = 952.7 on 8 df, p=<2e-16
## Score (logrank) test = NA on 8 df, p=NA
schoenresid<-resid(fit2, type="schoenfeld")
summary(schoenresid)
## move.ynot displaced move.ydisplaced racenhb
## Min. :-0.20707 Min. :-0.09296 Min. :-0.1657
## 1st Qu.:-0.19827 1st Qu.:-0.09296 1st Qu.:-0.1607
## Median :-0.19485 Median :-0.08951 Median :-0.1527
## Mean : 0.06708 Mean : 0.01343 Mean : 0.2048
## 3rd Qu.: 0.79293 3rd Qu.:-0.06341 3rd Qu.: 0.8343
## Max. : 0.82439 Max. : 0.93659 Max. : 0.8473
## racehisp raceasn racenh-other
## Min. :-0.27433 Min. :-0.32835 Min. :-0.129875
## 1st Qu.:-0.14850 1st Qu.:-0.08490 1st Qu.:-0.035144
## Median :-0.12697 Median :-0.04778 Median :-0.020601
## Mean :-0.03155 Mean :-0.00289 Mean :-0.001424
## 3rd Qu.:-0.11746 3rd Qu.:-0.03304 3rd Qu.:-0.014826
## Max. : 0.88254 Max. : 0.96696 Max. : 0.985174
## sex educlths
## Min. :-0.316763 Min. :-0.27112
## 1st Qu.:-0.316084 1st Qu.:-0.25238
## Median :-0.315303 Median :-0.23172
## Mean : 0.006677 Mean : 0.04114
## 3rd Qu.: 0.683237 3rd Qu.: 0.72888
## Max. : 0.687223 Max. : 0.77429
#des2$variables
#length(des2$variables$time[des2$variables$healtran==1])
#length(des2$variables$healtran==1)
#length(schoenresid)
#table(datcomb$healtran)
fit.sr<-lm(schoenresid~des2$variables$time[des2$variables$healtran==1])
summary(fit.sr)
## Response move.ynot displaced :
##
## Call:
## lm(formula = `move.ynot displaced` ~ des2$variables$time[des2$variables$healtran ==
## 1])
##
## Residuals:
## Min 1Q Median 3Q Max
## -0.2865 -0.2702 -0.2551 0.7135 0.7676
##
## Coefficients:
## Estimate Std. Error
## (Intercept) 0.041685 0.022799
## des2$variables$time[des2$variables$healtran == 1] 0.007551 0.006455
## t value Pr(>|t|)
## (Intercept) 1.828 0.0676 .
## des2$variables$time[des2$variables$healtran == 1] 1.170 0.2421
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 0.4404 on 3991 degrees of freedom
## Multiple R-squared: 0.0003428, Adjusted R-squared: 9.23e-05
## F-statistic: 1.368 on 1 and 3991 DF, p-value: 0.2421
##
##
## Response move.ydisplaced :
##
## Call:
## lm(formula = move.ydisplaced ~ des2$variables$time[des2$variables$healtran ==
## 1])
##
## Residuals:
## Min 1Q Median 3Q Max
## -0.10801 -0.10503 -0.10158 -0.07846 0.92452
##
## Coefficients:
## Estimate Std. Error
## (Intercept) 0.0100907 0.0153133
## des2$variables$time[des2$variables$healtran == 1] 0.0009919 0.0043356
## t value Pr(>|t|)
## (Intercept) 0.659 0.510
## des2$variables$time[des2$variables$healtran == 1] 0.229 0.819
##
## Residual standard error: 0.2958 on 3991 degrees of freedom
## Multiple R-squared: 1.311e-05, Adjusted R-squared: -0.0002374
## F-statistic: 0.05234 on 1 and 3991 DF, p-value: 0.8191
##
##
## Response racenhb :
##
## Call:
## lm(formula = racenhb ~ des2$variables$time[des2$variables$healtran ==
## 1])
##
## Residuals:
## Min 1Q Median 3Q Max
## -0.391 -0.371 -0.348 0.624 0.667
##
## Coefficients:
## Estimate Std. Error
## (Intercept) 0.255350 0.024856
## des2$variables$time[des2$variables$healtran == 1] -0.015021 0.007037
## t value Pr(>|t|)
## (Intercept) 10.273 <2e-16 ***
## des2$variables$time[des2$variables$healtran == 1] -2.134 0.0329 *
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 0.4802 on 3991 degrees of freedom
## Multiple R-squared: 0.00114, Adjusted R-squared: 0.0008899
## F-statistic: 4.556 on 1 and 3991 DF, p-value: 0.03287
##
##
## Response racehisp :
##
## Call:
## lm(formula = racehisp ~ des2$variables$time[des2$variables$healtran ==
## 1])
##
## Residuals:
## Min 1Q Median 3Q Max
## -0.25004 -0.11888 -0.09735 -0.08671 0.92279
##
## Coefficients:
## Estimate Std. Error
## (Intercept) -0.013660 0.017416
## des2$variables$time[des2$variables$healtran == 1] -0.005319 0.004931
## t value Pr(>|t|)
## (Intercept) -0.784 0.433
## des2$variables$time[des2$variables$healtran == 1] -1.079 0.281
##
## Residual standard error: 0.3364 on 3991 degrees of freedom
## Multiple R-squared: 0.0002914, Adjusted R-squared: 4.092e-05
## F-statistic: 1.163 on 1 and 3991 DF, p-value: 0.2808
##
##
## Response raceasn :
##
## Call:
## lm(formula = raceasn ~ des2$variables$time[des2$variables$healtran ==
## 1])
##
## Residuals:
## Min 1Q Median 3Q Max
## -0.32702 -0.08128 -0.04415 -0.03057 0.97058
##
## Coefficients:
## Estimate Std. Error
## (Intercept) 0.0009784 0.0130322
## des2$variables$time[des2$variables$healtran == 1] -0.0011502 0.0036898
## t value Pr(>|t|)
## (Intercept) 0.075 0.940
## des2$variables$time[des2$variables$healtran == 1] -0.312 0.755
##
## Residual standard error: 0.2517 on 3991 degrees of freedom
## Multiple R-squared: 2.435e-05, Adjusted R-squared: -0.0002262
## F-statistic: 0.09718 on 1 and 3991 DF, p-value: 0.7553
##
##
## Response racenh-other :
##
## Call:
## lm(formula = `racenh-other` ~ des2$variables$time[des2$variables$healtran ==
## 1])
##
## Residuals:
## Min 1Q Median 3Q Max
## -0.13639 -0.03681 -0.02227 -0.01257 0.99321
##
## Coefficients:
## Estimate Std. Error
## (Intercept) -0.017734 0.009904
## des2$variables$time[des2$variables$healtran == 1] 0.004850 0.002804
## t value Pr(>|t|)
## (Intercept) -1.791 0.0734 .
## des2$variables$time[des2$variables$healtran == 1] 1.730 0.0838 .
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 0.1913 on 3991 degrees of freedom
## Multiple R-squared: 0.0007489, Adjusted R-squared: 0.0004985
## F-statistic: 2.991 on 1 and 3991 DF, p-value: 0.0838
##
##
## Response sex :
##
## Call:
## lm(formula = sex ~ des2$variables$time[des2$variables$healtran ==
## 1])
##
## Residuals:
## Min 1Q Median 3Q Max
## -0.3285 -0.3239 -0.3193 0.6753 0.6847
##
## Coefficients:
## Estimate Std. Error
## (Intercept) -0.003614 0.024194
## des2$variables$time[des2$variables$healtran == 1] 0.003060 0.006850
## t value Pr(>|t|)
## (Intercept) -0.149 0.881
## des2$variables$time[des2$variables$healtran == 1] 0.447 0.655
##
## Residual standard error: 0.4674 on 3991 degrees of freedom
## Multiple R-squared: 5e-05, Adjusted R-squared: -0.0002006
## F-statistic: 0.1996 on 1 and 3991 DF, p-value: 0.6551
##
##
## Response educlths :
##
## Call:
## lm(formula = educlths ~ des2$variables$time[des2$variables$healtran ==
## 1])
##
## Residuals:
## Min 1Q Median 3Q Max
## -0.3149 -0.2942 -0.2697 0.6889 0.7363
##
## Coefficients:
## Estimate Std. Error
## (Intercept) 0.047543 0.023396
## des2$variables$time[des2$variables$healtran == 1] -0.001903 0.006624
## t value Pr(>|t|)
## (Intercept) 2.032 0.0422 *
## des2$variables$time[des2$variables$healtran == 1] -0.287 0.7739
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 0.452 on 3991 degrees of freedom
## Multiple R-squared: 2.068e-05, Adjusted R-squared: -0.0002299
## F-statistic: 0.08252 on 1 and 3991 DF, p-value: 0.7739
fit.test<-cox.zph(fit2)
fit.test
## rho chisq p
## move.ynot displaced 0.0375 21.33 3.87e-06
## move.ydisplaced 0.1079 167.05 3.26e-38
## racenhb -0.1919 1023.88 1.16e-224
## racehisp -0.1154 237.93 1.11e-53
## raceasn 0.0696 37.25 1.04e-09
## racenh-other 0.0205 4.31 3.80e-02
## sex 0.0384 18.39 1.80e-05
## educlths -0.0295 10.56 1.16e-03
## GLOBAL NA 1730.88 0.00e+00
par(mfrow=c(3,3))
plot(fit.test, df=2)
par(mfrow=c(1,1))
As discuused earlier, The schoenfeld residuals are all significant, and as discussed earlier, this is probably not a good model for this data set.
#extract Martingale residuals
fit2 <- svycoxph(Surv(time = time, event = healtran)~ move.y +race + educ + sex + age , data = datcomb, design = des2)
#fit2.move2 <- svycoxph(Surv(time = time, event = healtran)~ strata(move.y), data = datcomb, design = des2)
#fit2.race <- svycoxph(Surv(time = time, event = healtran)~ race, data = datcomb, design = des2)
#fit2.educ <- svycoxph(Surv(time = time, event = healtran)~ race, data = datcomb, design = des2)
fit2.full <- svycoxph(Surv(time = time, event = healtran)~ race + educ + sex + age
+ strata(move.y), data = datcomb, design = des2)
#fit2.quad <- svycoxph(Surv(time = time, event = healtran)~ race + educ + sex + age
# + strata(move.y) + move.y^2, data = datcomb, design = des2)
#res.mar.move2<-resid(fit2.move2, type="martingale")
#res.mar.race<-resid(fit2.race, type="martingale")
res.mar.full <- resid(fit2.full, type="martingale")
res.mar <- resid(fit2, type="martingale")
#res.mar.quad <- resid(fit2.quad, type="martingale")
#length(res.mar.race2)
#length(des$variables$race)
#scatter.smooth(des$variables$move.y, res.mar.move2,degree = 2,span = 1, ylab="Martingale Residual",col=1, cex=.5, lpars=list(col = "red", lwd = 3))
#scatter.smooth(des$variables$race, res.mar.race,degree = 2,span = 1, ylab="Martingale Residual",col=1, cex=.5, lpars=list(col = "red", lwd = 3))
#scatter.smooth(des$variables$move.y, res.mar.move,degree = 2,span = 1, ylab="Martingale Residual",col=1, cex=.5, lpars=list(col = "red", lwd = 3))
#full model
scatter.smooth(des$variables$move.y, res.mar,degree = 2,span = 5, ylab="Martingale Residual",col=1, cex=.5, lpars=list(col = "red", lwd = 5))
#strata move.y
#scatter.smooth(des$variables$move.y, res.mar.full,degree = 2,span = 1, ylab="Martingale Residual",col=1, cex=.5, lpars=list(col = "red", lwd = 3))
#summary(fit2.full)
#res.mar<-resid(fit2, type="martingale")
#plot vs maternal education
#scatter.smooth(des$variables$move.y, res.mar.quad,degree = 2,span = 1, ylab="Martingale Residual",col=1, cex=.5, lpars=list(col = "red", lwd = 3))
#title(main="Martingale residuals for Mother' < High School's Education")
regTermTest(fit2, ~move.y, method="LRT")
## Working (Rao-Scott+F) LRT for move.y
## in svycoxph(formula = Surv(time = time, event = healtran) ~ move.y +
## race + educ + sex + age, design = des2, data = datcomb)
## Working 2logLR = 40.57694 p= 1.725e-09
## (scale factors: 1 0.98 ); denominator df= 8762
#regTermTest(fit2.full, ~move.y, method="LRT")
I tried both the schoenfeld residuals and the martingale residuals. The schoenfeld residuals were significant for each of the predictors and give evidence that this model is probably not appropriate for this data set.
attempted and failed. This data is really not appropriate for this method
#descriptive statistics
#datcomb <- datcomb %>%
# mutate(
# age_enter = age,
# age_exit = age+2
# )
#datcomb %>%
# group_by(time) %>%
# summarise(prop_event = mean(healtran))
#datcomb %>%
# group_by(time, move.y) %>%
# summarise(prop_event = mean(healtran))
desc.full <- aggregate(healtran~race*educ*move.y, datcomb, mean, na.rm=T)
desc.race.move <- aggregate(healtran~race*move.y, datcomb, mean, na.rm=T)
desc.edu.move <- aggregate(healtran~educ*move.y, datcomb, mean, na.rm=T)
aggregate(healtran~move.y*time, datcomb, mean, na.rm=T)
move.y time healtran
1 not move 1 0.00000000 2 not displaced 1 0.00000000 3 displaced 1 0.00000000 4 not move 2 0.08351384 5 not displaced 2 0.08423142 6 displaced 2 0.14444444 7 not move 3 0.08928977 8 not displaced 3 0.07559332 9 displaced 3 0.13310962 10 not move 4 0.07196743 11 not displaced 4 0.07872596 12 displaced 4 0.14285714 13 not move 5 0.06504841 14 not displaced 5 0.08327325 15 displaced 5 0.17233560
stargazer(desc.full, type = "html",style = "demography", title = "Proportion by race, education and move status", summary = F)
| race | educ | move.y | healtran | |
| 1 | nhw | hs+ | not move | 0.042 |
| 2 | nhb | hs+ | not move | 0.071 |
| 3 | hisp | hs+ | not move | 0.086 |
| 4 | asn | hs+ | not move | 0.045 |
| 5 | nh-other | hs+ | not move | 0.054 |
| 6 | nhw | lths | not move | 0.095 |
| 7 | nhb | lths | not move | 0.141 |
| 8 | hisp | lths | not move | 0.138 |
| 9 | asn | lths | not move | 0.113 |
| 10 | nh-other | lths | not move | 0.145 |
| 11 | nhw | hs+ | not displaced | 0.035 |
| 12 | nhb | hs+ | not displaced | 0.068 |
| 13 | hisp | hs+ | not displaced | 0.067 |
| 14 | asn | hs+ | not displaced | 0.071 |
| 15 | nh-other | hs+ | not displaced | 0.084 |
| 16 | nhw | lths | not displaced | 0.109 |
| 17 | nhb | lths | not displaced | 0.124 |
| 18 | hisp | lths | not displaced | 0.132 |
| 19 | asn | lths | not displaced | 0.196 |
| 20 | nh-other | lths | not displaced | 0.093 |
| 21 | nhw | hs+ | displaced | 0.091 |
| 22 | nhb | hs+ | displaced | 0.142 |
| 23 | hisp | hs+ | displaced | 0.126 |
| 24 | asn | hs+ | displaced | 0.142 |
| 25 | nh-other | hs+ | displaced | 0.098 |
| 26 | nhw | lths | displaced | 0.240 |
| 27 | nhb | lths | displaced | 0.194 |
| 28 | hisp | lths | displaced | 0.231 |
| 29 | asn | lths | displaced | 0.333 |
| 30 | nh-other | lths | displaced | 0.312 |
stargazer(desc.race.move, type = "html",style = "demography", title = "Proportion by race, education and move status", summary = F)
| race | move.y | healtran | |
| 1 | nhw | not move | 0.047 |
| 2 | nhb | not move | 0.086 |
| 3 | hisp | not move | 0.103 |
| 4 | asn | not move | 0.051 |
| 5 | nh-other | not move | 0.066 |
| 6 | nhw | not displaced | 0.042 |
| 7 | nhb | not displaced | 0.080 |
| 8 | hisp | not displaced | 0.082 |
| 9 | asn | not displaced | 0.081 |
| 10 | nh-other | not displaced | 0.085 |
| 11 | nhw | displaced | 0.108 |
| 12 | nhb | displaced | 0.156 |
| 13 | hisp | displaced | 0.157 |
| 14 | asn | displaced | 0.165 |
| 15 | nh-other | displaced | 0.133 |
stargazer(desc.edu.move, type = "html",style = "demography", title = "Proportion by race, education and move status", summary = F)
| educ | move.y | healtran | |
| 1 | hs+ | not move | 0.053 |
| 2 | lths | not move | 0.123 |
| 3 | hs+ | not displaced | 0.053 |
| 4 | lths | not displaced | 0.125 |
| 5 | hs+ | displaced | 0.116 |
| 6 | lths | displaced | 0.223 |
#discrete time model
fit3.dt<-svyglm(healtran~as.factor(time_start)+move.y+race-1+ educ-1, design=des2, family=binomial(link="cloglog"))
## Warning in eval(family$initialize): non-integer #successes in a binomial
## glm!
#summary(fit3.dt)
#stargazer(fit3.dt, type = "html",style = "demography", title = "Discrete time model with interactions for move status and education & move status and race", summary = F)
fit3.dt.int <-svyglm(healtran~as.factor(time_start)+move.y+race-1+ educ-1
+move.y*educ + move.y*race,
design=des2, family=binomial(link="cloglog"))
## Warning in eval(family$initialize): non-integer #successes in a binomial
## glm!
#summary(fit3.dt.int)
stargazer(fit3.dt, fit3.dt.int, type = "html",style = "demography", title = "Discrete time model with interactions for move status and education & move status and race", summary = F, align = T)
| healtran | ||
| Model 1 | Model 2 | |
| as.factor(time_start)0 | -20.060*** | -20.047*** |
| (0.122) | (0.124) | |
| as.factor(time_start)1 | -2.797*** | -2.781*** |
| (0.077) | (0.078) | |
| as.factor(time_start)2 | -2.755*** | -2.743*** |
| (0.067) | (0.070) | |
| as.factor(time_start)3 | -2.914*** | -2.900*** |
| (0.060) | (0.062) | |
| as.factor(time_start)4 | -3.008*** | -2.996*** |
| (0.119) | (0.121) | |
| move.ynot displaced | -0.088 | -0.244* |
| (0.068) | (0.098) | |
| move.ydisplaced | 0.592*** | 0.781*** |
| (0.101) | (0.134) | |
| racenhb | 0.491*** | 0.571*** |
| (0.076) | (0.093) | |
| racehisp | 0.448*** | 0.527*** |
| (0.088) | (0.110) | |
| raceasn | 0.036 | -0.093 |
| (0.142) | (0.161) | |
| racenh-other | 0.137 | 0.026 |
| (0.161) | (0.186) | |
| educlths | 0.837*** | 0.756*** |
| (0.060) | (0.068) | |
| move.ynot displaced:educlths | 0.343* | |
| (0.134) | ||
| move.ydisplaced:educlths | 0.166 | |
| (0.207) | ||
| move.ynot displaced:racenhb | 0.047 | |
| (0.164) | ||
| move.ydisplaced:racenhb | -0.952** | |
| (0.324) | ||
| move.ynot displaced:racehisp | -0.086 | |
| (0.198) | ||
| move.ydisplaced:racehisp | -0.721* | |
| (0.290) | ||
| move.ynot displaced:raceasn | 0.424* | |
| (0.200) | ||
| move.ydisplaced:raceasn | 0.382 | |
| (0.284) | ||
| move.ynot displaced:racenh-other | 0.530 | |
| (0.354) | ||
| move.ydisplaced:racenh-other | 0.324 | |
| (0.405) | ||
| N | 60,255 | 60,255 |
| Log Likelihood | -11,655.700 | -11,619.250 |
| AIC | 23,335.390 | 23,282.500 |
| p < .05; p < .01; p < .001 | ||
sums<-data.frame(round(summary(fit3.dt)$coef, 3)); sums$HR<-round(exp(sums[,1]), 3)
sums
Estimate Std..Error t.value Pr...t.. HR
as.factor(time_start)0 -20.060 0.122 -164.109 0.000 0.000 as.factor(time_start)1 -2.797 0.077 -36.213 0.000 0.061 as.factor(time_start)2 -2.755 0.067 -41.098 0.000 0.064 as.factor(time_start)3 -2.914 0.060 -48.713 0.000 0.054 as.factor(time_start)4 -3.008 0.119 -25.229 0.000 0.049 move.ynot displaced -0.088 0.068 -1.303 0.193 0.916 move.ydisplaced 0.592 0.101 5.870 0.000 1.808 racenhb 0.491 0.076 6.427 0.000 1.634 racehisp 0.448 0.088 5.063 0.000 1.565 raceasn 0.036 0.142 0.253 0.801 1.037 racenh-other 0.137 0.161 0.851 0.395 1.147 educlths 0.837 0.060 13.920 0.000 2.309
dat3.all<-expand.grid(time_start=as.factor(c(0,1,2,3,4)),healtran=c(0,1), race=levels(des2$variables$race), move.y=levels(des2$variables$move.y), educ = levels(des2$variables$educ))
head(dat3.all)
time_start healtran race move.y educ 1 0 0 nhw not move hs+ 2 1 0 nhw not move hs+ 3 2 0 nhw not move hs+ 4 3 0 nhw not move hs+ 5 4 0 nhw not move hs+ 6 0 1 nhw not move hs+
#Sys.setenv(R_MAX_VSIZE = 16e9)
dat3.all$fitted<-as.numeric(predict(fit3.dt.int, dat3.all, type="response"))
head(dat3.all)
time_start healtran race move.y educ fitted 1 0 0 nhw not move hs+ 1.967409e-09 2 1 0 nhw not move hs+ 6.008519e-02 3 2 0 nhw not move hs+ 6.231881e-02 4 3 0 nhw not move hs+ 5.352630e-02 5 4 0 nhw not move hs+ 4.877333e-02 6 0 1 nhw not move hs+ 1.967409e-09
#aggregate(healtran~race*educ*move.y, datcomb, mean, na.rm=T)
#aggregate(healtran~race*move.y, datcomb, mean, na.rm=T)
#aggregate(healtran~educ*move.y, datcomb, mean, na.rm=T)
#aggregate(healtran~move.y*time, datcomb, mean, na.rm=T)
#generating hazard plots
dat3.all%>%
mutate(
period = ifelse(time_start==1, "Period 1",
ifelse(time_start==2,"Period 2",
ifelse(time_start==3, "Period 3","Period 4")))
)%>%
ggplot()+geom_bar(aes(y=fitted,x=race,fill=race, group=race),stat="identity", position="dodge")+facet_grid(~move.y+educ)+ggtitle(label="Hazard of Transitioning into Poor Health by Race and Displaced status by wave")+
theme(axis.text.x = element_text(angle = 90))
dat3.all%>%
mutate(
period = ifelse(time_start==1, "Period 1",
ifelse(time_start==2,"Period 2",
ifelse(time_start==3, "Period 3","Period 4")))
)%>%
ggplot()+geom_bar(aes(y=fitted,x=educ,fill=educ, group=educ),stat="identity", position="dodge")+facet_wrap(~move.y+period, nrow=2)+ggtitle(label="Hazard of Transitioning into Poor Health by Education and Displaced status by wave")+
theme(axis.text.x = element_text(angle = 90))
#dat3.all%>%
#mutate(
# period = ifelse(time_start==1, "Period 1",
# ifelse(time_start==2,"Period 2",
# ifelse(time_start==3, "Period 3","Period 4")))
# )%>%
#ggplot()+geom_bar(aes(y=fitted,x=sex,fill=sex, group=sex),stat="identity", position="dodge")+facet_grid(~move.y+period)+ggtitle(label="Hazard of Transitioning into Poor Health by Sex and Displaced status by wave")+
#theme(axis.text.x = element_text(angle = 90))
see above
### b. Technical write up
I’m making comments below the output shown above to provide my technical write up so as not to duplicate output more than necessary.
###i. Tables of descriptive statistics
###ii. Results of statistical tests
###iii. Interpretation of the results
The purpose of this analysis is to determine differrnces of transitioning from good to poor health amongst three groups: 1)those who have not moved 2) those who have moved voluntarirly and 3) those who moved involuntarily (displaced). The data was obtained from the five PSID waves from 2005 through 2013. The PSID examines nationally representative individuals and families throughout the United States.
The outcome being examined is the transition form good health to poor health. The predictors examined in the analysis are race/ethnicity, education, sex and move status. The analysis examines the transition from good to poor health from 5 waves.
First, examining the proportions of the population by education and move status it can be seen that individuals with less than a high school education are more at risk moving voluntarily and for being displaced:
| educ | move.y | healtran | |
| 1 | hs+ | not move | 0.053 |
| 2 | lths | not move | 0.123 |
| 3 | hs+ | not displaced | 0.053 |
| 4 | lths | not displaced | 0.125 |
| 5 | hs+ | displaced | 0.116 |
| 6 | lths | displaced | 0.223 |
And next by race and move status, it can be seen that the differences by race are fairly minor, with non-Hispanic black, hispanic and asian being the most at risk:
| race | move.y | healtran | |
| 1 | nhw | not move | 0.047 |
| 2 | nhb | not move | 0.086 |
| 3 | hisp | not move | 0.103 |
| 4 | asn | not move | 0.051 |
| 5 | nh-other | not move | 0.066 |
| 6 | nhw | not displaced | 0.042 |
| 7 | nhb | not displaced | 0.080 |
| 8 | hisp | not displaced | 0.082 |
| 9 | asn | not displaced | 0.081 |
| 10 | nh-other | not displaced | 0.085 |
| 11 | nhw | displaced | 0.108 |
| 12 | nhb | displaced | 0.156 |
| 13 | hisp | displaced | 0.157 |
| 14 | asn | displaced | 0.165 |
| 15 | nh-other | displaced | 0.133 |
And finally by race, education and move status, ti can be seen that the combination of less than a high school education and a distinct race (such as asian or non-Hispanic other) are most at risk of being displaced.:
| race | educ | move.y | healtran | |
| 1 | nhw | hs+ | not move | 0.042 |
| 2 | nhb | hs+ | not move | 0.071 |
| 3 | hisp | hs+ | not move | 0.086 |
| 4 | asn | hs+ | not move | 0.045 |
| 5 | nh-other | hs+ | not move | 0.054 |
| 6 | nhw | lths | not move | 0.095 |
| 7 | nhb | lths | not move | 0.141 |
| 8 | hisp | lths | not move | 0.138 |
| 9 | asn | lths | not move | 0.113 |
| 10 | nh-other | lths | not move | 0.145 |
| 11 | nhw | hs+ | not displaced | 0.035 |
| 12 | nhb | hs+ | not displaced | 0.068 |
| 13 | hisp | hs+ | not displaced | 0.067 |
| 14 | asn | hs+ | not displaced | 0.071 |
| 15 | nh-other | hs+ | not displaced | 0.084 |
| 16 | nhw | lths | not displaced | 0.109 |
| 17 | nhb | lths | not displaced | 0.124 |
| 18 | hisp | lths | not displaced | 0.132 |
| 19 | asn | lths | not displaced | 0.196 |
| 20 | nh-other | lths | not displaced | 0.093 |
| 21 | nhw | hs+ | displaced | 0.091 |
| 22 | nhb | hs+ | displaced | 0.142 |
| 23 | hisp | hs+ | displaced | 0.126 |
| 24 | asn | hs+ | displaced | 0.142 |
| 25 | nh-other | hs+ | displaced | 0.098 |
| 26 | nhw | lths | displaced | 0.240 |
| 27 | nhb | lths | displaced | 0.194 |
| 28 | hisp | lths | displaced | 0.231 |
| 29 | asn | lths | displaced | 0.333 |
| 30 | nh-other | lths | displaced | 0.312 |
A Discrete Time model was first with the variables previously discussed (Model 1) then interatcions between both race and move status and education and ove status were included (Model 2):
| healtran | ||
| Model 1 | Model 2 | |
| as.factor(time_start)0 | -20.060*** | -20.047*** |
| (0.122) | (0.124) | |
| as.factor(time_start)1 | -2.797*** | -2.781*** |
| (0.077) | (0.078) | |
| as.factor(time_start)2 | -2.755*** | -2.743*** |
| (0.067) | (0.070) | |
| as.factor(time_start)3 | -2.914*** | -2.900*** |
| (0.060) | (0.062) | |
| as.factor(time_start)4 | -3.008*** | -2.996*** |
| (0.119) | (0.121) | |
| move.ynot displaced | -0.088 | -0.244* |
| (0.068) | (0.098) | |
| move.ydisplaced | 0.592*** | 0.781*** |
| (0.101) | (0.134) | |
| racenhb | 0.491*** | 0.571*** |
| (0.076) | (0.093) | |
| racehisp | 0.448*** | 0.527*** |
| (0.088) | (0.110) | |
| raceasn | 0.036 | -0.093 |
| (0.142) | (0.161) | |
| racenh-other | 0.137 | 0.026 |
| (0.161) | (0.186) | |
| educlths | 0.837*** | 0.756*** |
| (0.060) | (0.068) | |
| move.ynot displaced:educlths | 0.343* | |
| (0.134) | ||
| move.ydisplaced:educlths | 0.166 | |
| (0.207) | ||
| move.ynot displaced:racenhb | 0.047 | |
| (0.164) | ||
| move.ydisplaced:racenhb | -0.952** | |
| (0.324) | ||
| move.ynot displaced:racehisp | -0.086 | |
| (0.198) | ||
| move.ydisplaced:racehisp | -0.721* | |
| (0.290) | ||
| move.ynot displaced:raceasn | 0.424* | |
| (0.200) | ||
| move.ydisplaced:raceasn | 0.382 | |
| (0.284) | ||
| move.ynot displaced:racenh-other | 0.530 | |
| (0.354) | ||
| move.ydisplaced:racenh-other | 0.324 | |
| (0.405) | ||
| N | 60,255 | 60,255 |
| Log Likelihood | -11,655.700 | -11,619.250 |
| AIC | 23,335.390 | 23,282.500 |
| p < .05; p < .01; p < .001 | ||
From these two models, model two (the model with the interactions included) appears to fit the data better as it has a lower AIC. The only difference in terms of statistical significance with the covariates they share are the protective factor of moving but not being displaced over those who do not move. Being displaced means one has approximately %78 greater chance of transition from good health to poor health.
Examining are other covariates and it can be seen that being non-Hispanic black, Hispanic and havng less than a high school education all place an individual at greater risk of transitioning from good to poor health.
The most interesting interaction is that of non-Hispanic black with being displaced in that it is protective. One explanation for this may be that an already disadvantaged group may be more resililient to a negative life circumstance such as being displaced better than an advantaged group like non-Hispanic whites. likewise, being hispanic and being displaced also is protective versus non-hispanic white. The significant effect of the interactions is moving but not being displaced and being asian. These individuals are 36% more likely to transition from good to poor health.This is particularly strange, as this effect on the population at large is protective, so maybe non-Hispanic asians are much more bothered by moving that non-Hispanic whites. Next we shall examine a plot to see if these phenomenon can be observed:
Both of these phenomenon are readily apparent on the above plot. Both the Hispanic and non-Hispanic black fair far better than the non-Hispanic white population, while the asian population fare far worse.
Ultimately, there is sufficient evidence to support the claim that being displaced from one’s home increases the risk of transitioning from good to poor health, while there is some evidence to support the claim that moving voluntarily is protective of transitioning from good to poor health.