HW3
In this analysis I use data from the NHIS linked with mortality data by IPUMS. The outcome I am interested in is whether or not the individual in the survey died. The covariates I am using in this analysis are based on covariates used in studies of heart disease since that is what I hope to eventually do with this data. I base these off several of the covariates that are used in papers from the Framingham Heart Study. These are sex, age, BMI, obesity (as a binary option based on BMI), and self-rated health. I initially included other variables and removed several because they were not significant and were making the models take an extremely long time to run.
I am using proportional hazards model because I am interested in whether death occurred and not the duration which an accelerated failure time model would provide. I ran the model on three different distributions; exponential, Weibull, and the piecewise constant hazard. Ultimately, the Weibull and piecewise distributions were incredibly similar but I decided to continue with the Weibull because I found it to be more reliable in not causing errors in R. The lognormal, loglogistic, and gompertz distributions did not work for me in R. Every time attempt I made resulted in a time out error.
All of the variables I included in the model were significant, this is likely due to the fact that I didn’t put in any “weird” variables but rather variables that are typically used in mortality analysis. In the future I’ll have to find more creative variables for any outcomes I am interested in. I also tested for interactions between age and obesity which was also significant. I think this makes sense because obesity is measured at the time of survey and so we only know about the obesity status at a single point in time and someone who was young during the survey may move out of the obesity classification whereas an older adult may be less likely to do that.
Covariate Mean Coef Rel.Risk S.E. LR p
SEX 1.556 -0.250 0.779 0.008 0.0000
AGE 53.452 0.050 1.051 0.000 0.0000
OBESE 0.258 0.123 1.131 0.014 0.0000
BMICALC 27.265 -0.015 0.985 0.001 0.0000
GOOD_HEALTH 0.844 -0.602 0.548 0.008 0.0000
Events 69625
Total time at risk 30333012
Max. log. likelihood -464655
LR test statistic 56143.79
Degrees of freedom 5
Overall p-value 0
Covariate Mean Coef Rel.Risk S.E. LR p
SEX 1.556 -0.360 0.698 0.008 0.0000
AGE 53.452 -0.093 0.911 0.001 0.0000
OBESE 0.258 0.228 1.256 0.014 0.0000
BMICALC 27.265 -0.023 0.977 0.001 0.0000
GOOD_HEALTH 0.844 -0.758 0.469 0.008 0.0000
Events 69625
Total time at risk 30333012
Max. log. likelihood -321528
LR test statistic 43412.17
Degrees of freedom 5
Overall p-value 0
Covariate Mean Coef Rel.Risk S.E. LR p
SEX 1.556 -0.360 0.698 0.008 0.0000
AGE 53.452 -0.093 0.911 0.001 0.0000
OBESE 0.258 0.228 1.256 0.014 0.0000
BMICALC 27.265 -0.023 0.977 0.001 0.0000
GOOD_HEALTH 0.844 -0.758 0.469 0.008 0.0000
Events 69625
Total time at risk 30333012
Max. log. likelihood -321528
LR test statistic 43412.17
Degrees of freedom 5
Overall p-value 0
plot(fit.5, fn = "haz", main = "Piecewise")[1] 929322.1
[1] 643070.3
[1] 643070.3
Call:
lm(formula = death_age ~ AGE * OBESE * GOOD_HEALTH * SEX, data = nhis_dat)
Residuals:
Min 1Q Median 3Q Max
-11.9973 -4.6149 -0.6915 4.7162 15.3481
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 14.336922 0.316672 45.274 < 2e-16 ***
AGE 0.873753 0.005154 169.516 < 2e-16 ***
OBESE -0.952327 0.561209 -1.697 0.089712 .
GOOD_HEALTH -1.250379 0.327596 -3.817 0.000135 ***
SEX 0.152952 0.190525 0.803 0.422095
AGE:OBESE 0.013269 0.009513 1.395 0.163079
AGE:GOOD_HEALTH 0.049074 0.005430 9.038 < 2e-16 ***
OBESE:GOOD_HEALTH 0.983165 0.593382 1.657 0.097544 .
AGE:SEX 0.010506 0.003091 3.399 0.000676 ***
OBESE:SEX -0.631319 0.329884 -1.914 0.055651 .
GOOD_HEALTH:SEX -0.038874 0.197342 -0.197 0.843836
AGE:OBESE:GOOD_HEALTH -0.024864 0.010272 -2.421 0.015498 *
AGE:OBESE:SEX 0.009351 0.005583 1.675 0.093976 .
AGE:GOOD_HEALTH:SEX -0.006589 0.003260 -2.021 0.043240 *
OBESE:GOOD_HEALTH:SEX 0.059946 0.350203 0.171 0.864085
AGE:OBESE:GOOD_HEALTH:SEX 0.000710 0.006057 0.117 0.906677
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
Residual standard error: 5.542 on 525356 degrees of freedom
Multiple R-squared: 0.9015, Adjusted R-squared: 0.9015
F-statistic: 3.204e+05 on 15 and 525356 DF, p-value: < 2.2e-16
Single term deletions
Model:
Surv(death_age, d.event) ~ AGE * OBESE
Df AIC LRT Pr(>Chi)
<none> 652497
AGE:OBESE 1 652915 420 <2e-16 ***
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
Covariate Mean Coef Rel.Risk S.E. Wald p
AGE 53.452 -0.085 0.919 0.001 0.0000
OBESE 0.258 1.100 3.003 0.049 0.0000
AGE:OBESE
: -0.015 0.985 0.001 0.0000
Events 69625
Total time at risk 30333012
Max. log. likelihood -326244
LR test statistic 33981.04
Degrees of freedom 3
Overall p-value 0