HW3

Author

Drew Schaefer

In this analysis I use data from the NHIS linked with mortality data by IPUMS. The outcome I am interested in is whether or not the individual in the survey died. The covariates I am using in this analysis are based on covariates used in studies of heart disease since that is what I hope to eventually do with this data. I base these off several of the covariates that are used in papers from the Framingham Heart Study. These are sex, age, BMI, obesity (as a binary option based on BMI), and self-rated health. I initially included other variables and removed several because they were not significant and were making the models take an extremely long time to run.

I am using proportional hazards model because I am interested in whether death occurred and not the duration which an accelerated failure time model would provide. I ran the model on three different distributions; exponential, Weibull, and the piecewise constant hazard. Ultimately, the Weibull and piecewise distributions were incredibly similar but I decided to continue with the Weibull because I found it to be more reliable in not causing errors in R. The lognormal, loglogistic, and gompertz distributions did not work for me in R. Every time attempt I made resulted in a time out error.

All of the variables I included in the model were significant, this is likely due to the fact that I didn’t put in any “weird” variables but rather variables that are typically used in mortality analysis. In the future I’ll have to find more creative variables for any outcomes I am interested in. I also tested for interactions between age and obesity which was also significant. I think this makes sense because obesity is measured at the time of survey and so we only know about the obesity status at a single point in time and someone who was young during the survey may move out of the obesity classification whereas an older adult may be less likely to do that.

Covariate             Mean       Coef     Rel.Risk   S.E.    LR p
SEX                   1.556    -0.250     0.779     0.008   0.0000 
AGE                  53.452     0.050     1.051     0.000   0.0000 
OBESE                 0.258     0.123     1.131     0.014   0.0000 
BMICALC              27.265    -0.015     0.985     0.001   0.0000 
GOOD_HEALTH           0.844    -0.602     0.548     0.008   0.0000 

Events                    69625 
Total time at risk        30333012 
Max. log. likelihood      -464655 
LR test statistic         56143.79 
Degrees of freedom        5 
Overall p-value           0

Covariate             Mean       Coef     Rel.Risk   S.E.    LR p
SEX                   1.556    -0.360     0.698     0.008   0.0000 
AGE                  53.452    -0.093     0.911     0.001   0.0000 
OBESE                 0.258     0.228     1.256     0.014   0.0000 
BMICALC              27.265    -0.023     0.977     0.001   0.0000 
GOOD_HEALTH           0.844    -0.758     0.469     0.008   0.0000 

Events                    69625 
Total time at risk        30333012 
Max. log. likelihood      -321528 
LR test statistic         43412.17 
Degrees of freedom        5 
Overall p-value           0

Covariate             Mean       Coef     Rel.Risk   S.E.    LR p
SEX                   1.556    -0.360     0.698     0.008   0.0000 
AGE                  53.452    -0.093     0.911     0.001   0.0000 
OBESE                 0.258     0.228     1.256     0.014   0.0000 
BMICALC              27.265    -0.023     0.977     0.001   0.0000 
GOOD_HEALTH           0.844    -0.758     0.469     0.008   0.0000 

Events                    69625 
Total time at risk        30333012 
Max. log. likelihood      -321528 
LR test statistic         43412.17 
Degrees of freedom        5 
Overall p-value           0
plot(fit.5, fn = "haz", main = "Piecewise")

[1] 929322.1
[1] 643070.3
[1] 643070.3


Call:
lm(formula = death_age ~ AGE * OBESE * GOOD_HEALTH * SEX, data = nhis_dat)

Residuals:
     Min       1Q   Median       3Q      Max 
-11.9973  -4.6149  -0.6915   4.7162  15.3481 

Coefficients:
                           Estimate Std. Error t value Pr(>|t|)    
(Intercept)               14.336922   0.316672  45.274  < 2e-16 ***
AGE                        0.873753   0.005154 169.516  < 2e-16 ***
OBESE                     -0.952327   0.561209  -1.697 0.089712 .  
GOOD_HEALTH               -1.250379   0.327596  -3.817 0.000135 ***
SEX                        0.152952   0.190525   0.803 0.422095    
AGE:OBESE                  0.013269   0.009513   1.395 0.163079    
AGE:GOOD_HEALTH            0.049074   0.005430   9.038  < 2e-16 ***
OBESE:GOOD_HEALTH          0.983165   0.593382   1.657 0.097544 .  
AGE:SEX                    0.010506   0.003091   3.399 0.000676 ***
OBESE:SEX                 -0.631319   0.329884  -1.914 0.055651 .  
GOOD_HEALTH:SEX           -0.038874   0.197342  -0.197 0.843836    
AGE:OBESE:GOOD_HEALTH     -0.024864   0.010272  -2.421 0.015498 *  
AGE:OBESE:SEX              0.009351   0.005583   1.675 0.093976 .  
AGE:GOOD_HEALTH:SEX       -0.006589   0.003260  -2.021 0.043240 *  
OBESE:GOOD_HEALTH:SEX      0.059946   0.350203   0.171 0.864085    
AGE:OBESE:GOOD_HEALTH:SEX  0.000710   0.006057   0.117 0.906677    
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Residual standard error: 5.542 on 525356 degrees of freedom
Multiple R-squared:  0.9015,    Adjusted R-squared:  0.9015 
F-statistic: 3.204e+05 on 15 and 525356 DF,  p-value: < 2.2e-16
Single term deletions

Model:
Surv(death_age, d.event) ~ AGE * OBESE
          Df    AIC LRT Pr(>Chi)    
<none>       652497                 
AGE:OBESE  1 652915 420   <2e-16 ***
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Covariate             Mean       Coef     Rel.Risk   S.E.    Wald p
AGE                  53.452    -0.085     0.919     0.001   0.0000 
OBESE                 0.258     1.100     3.003     0.049   0.0000 
AGE:OBESE        
   :                         -0.015     0.985     0.001    0.0000 

Events                    69625 
Total time at risk        30333012 
Max. log. likelihood      -326244 
LR test statistic         33981.04 
Degrees of freedom        3 
Overall p-value           0