DEM7223 Event History Analysis Fall 2024 Problem Set 2 Univariate Event History Analysis

Author

Jacob M. Souch

Import Data

library(survival)
library(tidyverse)
Warning: package 'ggplot2' was built under R version 4.3.2
Warning: package 'readr' was built under R version 4.3.2
Warning: package 'purrr' was built under R version 4.3.2
Warning: package 'dplyr' was built under R version 4.3.2
── Attaching core tidyverse packages ──────────────────────── tidyverse 2.0.0 ──
✔ dplyr     1.1.3     ✔ readr     2.1.4
✔ forcats   1.0.0     ✔ stringr   1.5.0
✔ ggplot2   3.4.4     ✔ tibble    3.2.1
✔ lubridate 1.9.2     ✔ tidyr     1.3.0
✔ purrr     1.0.2     
── Conflicts ────────────────────────────────────────── tidyverse_conflicts() ──
✖ dplyr::filter() masks stats::filter()
✖ dplyr::lag()    masks stats::lag()
ℹ Use the conflicted package (<http://conflicted.r-lib.org/>) to force all conflicts to become errors
library(survminer)
Warning: package 'survminer' was built under R version 4.3.2
Loading required package: ggpubr
Warning: package 'ggpubr' was built under R version 4.3.2

Attaching package: 'survminer'

The following object is masked from 'package:survival':

    myeloma
library(ggsurvfit)
library(eha)
Warning: package 'eha' was built under R version 4.3.3
library(survival)
library(readr)
MHAS0121 <- read_csv("C:/Users/jacob/Downloads/MHAS0121_NOLABELS.csv")
Rows: 9636 Columns: 39
── Column specification ────────────────────────────────────────────────────────
Delimiter: ","
dbl (39): id, locsize01, perwght01, died18, died21, mobirth, yrbirth, female...

ℹ Use `spec()` to retrieve the full column specification for this data.
ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.
fit <- with(MHAS0121, Surv(agecensor,died0121))

ID – Unique person identifier

FEMALE – dummy variable for whether the sampled individual identifies as a woman or a man (no other options; note that in some in-person questionnaires, this is based on external classification by interviewers as opposed to self-reported by interviewees).

SCHOOLING – Years of formal schooling.

EDUCLEVEL – Schooling variable coding in intervals, including: no formal schooling (0), 1 – 5 years (15), 6-8 years (68), and 9 or more years (because it is topped at 19 years, value is 919).

LOCSIZE01 – Size of the locality of residence at baseline in 2001, ranging from 100,000+ (1), 15,000 – 99,999 (2), 2,500 – 14,999 (3), and less than 2,500 (4). In Mexico, urban areas are defined as those having at least 2,500 inhabitants (one of the lowest bars for urban I have ever encountered).

MOBIRTH – Month of birth

YRBIRTH – Year of birth

MOINT – Month of baseline interview (in 2001)

YRINT – Year of baseline interview (i.e., 2001)

DIED0121 – Dummy variable for whether person passed away in the inter-wave period (otherwise,censored at re-interview, lost to follow-up, or refused interview, all coded as 0)

MOCENSOR – Month in which person was ‘observed’ for the last time (whether because s/he passed

away, was re-interviewed, rejected the survey, or when we assume was lost to follow-up). YRCENSOR – Year in which person was ‘observed’ for the last time (see MOCENSOR). We will also create other variables in Stata that we will need for the exercise (see accompanying Stata do file):

AGEINT – Age at which person was interviewed at baseline

AGECENSOR – Age the interviewee was when we observed them last.


(1) the variable that measures the time each person was exposed to the risk of death before dying/being censored for other reasons.

Agecensor is the variabel that measures the time each person was exposed to the risk of death before dying or censoring.

(2) the variable that indicates whether the event occurred or not (i.e., whether the person died in the interval); Died0121 is the variable that indicates whether or not th person has died.

(3)the time in which the person entered and exited the risk set. After doing this (or, without it, if your package does not require it):

Data Preparation

MHAS0121$sex <- abs(as.numeric(MHAS0121$female) - 1)
MHAS0121$locsize01 <- as.factor(MHAS0121$locsize01)
MHAS0121$educlevel <- as.factor(MHAS0121$educlevel)

Models

I. Multivariate parametric proportional hazards models

A. Intercept-Only Exponential Model

Estimate parametric proportional hazards model with an exponential (i.e., constant) with no covariates. Instead of hazard ratios, obtain coefficients. Exponentiate the intercept of the model. (Later on, we will compare this to the crude death rate for people 50+ we will calculate in a later problem set).

model01 <- phreg(formula = Surv(agecensor, died0121) ~ 1, data = MHAS0121, dist = "weibull", shape =1) 
model01
Call:
phreg(formula = Surv(agecensor, died0121) ~ 1, data = MHAS0121, 
    dist = "weibull", shape = 1)

Covariate          W.mean      Coef Exp(Coef)  se(Coef)    Wald p
log(scale)                    4.999               0.016     0.000 

 Shape is fixed at  1 

Events                    3994 
Total time at risk        592255 
Max. log. likelihood      -23961 
stargazer::stargazer(model01, type="text",  out="models.txt",apply.coef = exp)

==========================================
                   Dependent variable:    
               ---------------------------
                        agecensor         
------------------------------------------
log(scale)             148.286***         
                         (0.016)          
                                          
------------------------------------------
Observations              7,565           
Log Likelihood         -23,960.580        
==========================================
Note:          *p<0.1; **p<0.05; ***p<0.01
extractAIC(model01)
[1]     1.00 47923.16

B. Female Covariate Exponential Model

Estimate another exponential, proportional hazards model, using FEMALE as the only covariate. Obtain coefficients, not hazard ratios

model02 <- phreg(formula = Surv(agecensor, died0121) ~ factor(sex), data = MHAS0121, dist = "weibull", shape =1) 
model02
Call:
phreg(formula = Surv(agecensor, died0121) ~ factor(sex), data = MHAS0121, 
    dist = "weibull", shape = 1)

Covariate          W.mean      Coef Exp(Coef)  se(Coef)    Wald p
factor(sex) 
               0    0.549     0         1           (reference)
               1    0.451     0.099     1.104     0.032     0.002 

log(scale)                    5.045               0.022     0.000 

 Shape is fixed at  1 

Events                    3994 
Total time at risk        592255 
Max. log. likelihood      -23956 
LR test statistic         9.74 
Degrees of freedom        1 
Overall p-value           0.00179989
stargazer::stargazer(model02, type="text",  out="models.txt",apply.coef = exp)

==========================================
                   Dependent variable:    
               ---------------------------
                        agecensor         
------------------------------------------
factor(sex)1            1.104***          
                         (0.032)          
                                          
log(scale)             155.240***         
                         (0.022)          
                                          
------------------------------------------
Observations              7,565           
Log Likelihood         -23,955.710        
==========================================
Note:          *p<0.1; **p<0.05; ***p<0.01
extractAIC(model02)
[1]     2.00 47915.42

i. Interpret the coefficient for FEMALE after exponentiating it.


ii. Exponentiate the intercept of this model and interpret this figure.

iii. Sum the constant and the coefficient for female, and exponentiate this number. What is
this equivalent to/how would you interpret it?

Estimate another PH parametric model controlling for FEMALE, SCHOOLING (or categories of EDUCLEVEL, whatever you prefer, justifying your choice), and LOCSIZE01. Assuming the baseline hazard is distributed:

C. Multiple Models (Schooling)

i. Exponential Model (Covariates, schooling)

model03 <- phreg(formula = Surv(agecensor, died0121) ~ sex + schooling + locsize01, data = MHAS0121, dist = "weibull", shape =1) 
model03
Call:
phreg(formula = Surv(agecensor, died0121) ~ sex + schooling + 
    locsize01, data = MHAS0121, dist = "weibull", shape = 1)

Covariate          W.mean      Coef Exp(Coef)  se(Coef)    Wald p
sex                 0.450     0.134     1.144     0.032     0.000 
schooling           4.175    -0.038     0.963     0.004     0.000 
locsize01 
               1    0.572     0         1           (reference)
               2    0.154    -0.023     0.978     0.045     0.620 
               3    0.099    -0.047     0.954     0.055     0.386 
               4    0.175    -0.138     0.871     0.045     0.002 

log(scale)                    4.884               0.031     0.000 

 Shape is fixed at  1 

Events                    3983 
Total time at risk        591241 
Max. log. likelihood      -23852 
LR test statistic         92.60 
Degrees of freedom        5 
Overall p-value           0
stargazer::stargazer(model03, type="text",  out="models.txt",apply.coef = exp)

==========================================
                   Dependent variable:    
               ---------------------------
                        agecensor         
------------------------------------------
sex                     1.144***          
                         (0.032)          
                                          
schooling               0.963***          
                         (0.004)          
                                          
locsize012              0.978***          
                         (0.045)          
                                          
locsize013              0.954***          
                         (0.055)          
                                          
locsize014              0.871***          
                         (0.045)          
                                          
log(scale)             132.128***         
                         (0.031)          
                                          
------------------------------------------
Observations              7,552           
Log Likelihood         -23,852.450        
==========================================
Note:          *p<0.1; **p<0.05; ***p<0.01
extractAIC(model03)
[1]     6.00 47716.91

ii. Gompertz Model (Covariates, schooling)

model04 <- phreg(formula = Surv(agecensor, died0121) ~ sex + schooling + locsize01, data = MHAS0121, dist = "gompertz") 
Warning in Hgompertz(exit, scale = escale, shape = 1, param = "canonical"):
Non-positive shape or scale
Warning in Hgompertz(enter, scale = escale, shape = 1, param = "canonical"):
Non-positive shape or scale
Warning in Hgompertz(exit, scale = escale, shape = 1, param = "canonical"):
Non-positive shape or scale
Warning in Hgompertz(enter, scale = escale, shape = 1, param = "canonical"):
Non-positive shape or scale
Warning in Hgompertz(exit, scale = escale, shape = 1, param = "canonical"):
Non-positive shape or scale
Warning in Hgompertz(enter, scale = escale, shape = 1, param = "canonical"):
Non-positive shape or scale
Warning in Hgompertz(exit, scale = escale, shape = 1, param = "canonical"):
Non-positive shape or scale
Warning in Hgompertz(enter, scale = escale, shape = 1, param = "canonical"):
Non-positive shape or scale
Warning in Hgompertz(Y[strata == i, 2], shape = shape, scale = scale, param =
"canonical"): Non-positive shape or scale
Warning in Hgompertz(Y[strata == i, 1], shape = shape, scale = scale, param =
"canonical"): Non-positive shape or scale
scale =  0 
Warning in hgompertz(Y[strata == i, 2], shape = shape, scale = scale, log =
TRUE, : Non-positive scale
model04
Call:
phreg(formula = Surv(agecensor, died0121) ~ sex + schooling + 
    locsize01, data = MHAS0121, dist = "gompertz")

Covariate          W.mean      Coef Exp(Coef)  se(Coef)    Wald p
sex                 0.450     0.272     1.312     0.032     0.000 
schooling           4.175     0.017     1.017     0.004     0.000 
locsize01 
               1    0.572     0         1           (reference)
               2    0.154     0.036     1.037     0.046     0.427 
               3    0.099    -0.100     0.905     0.055     0.068 
               4    0.175    -0.140     0.869     0.046     0.002 

log(scale)                    2.181               0.011     0.000 
log(shape)                  -10.105               0.111     0.000 

Events                    3983 
Total time at risk        591241 
Max. log. likelihood      -17275 
LR test statistic         109.69 
Degrees of freedom        5 
Overall p-value           0
stargazer::stargazer(model04, type="text",  out="models.txt",apply.coef = exp)

==========================================
                   Dependent variable:    
               ---------------------------
                        agecensor         
------------------------------------------
sex                     1.312***          
                         (0.032)          
                                          
schooling               1.017***          
                         (0.004)          
                                          
locsize012              1.037***          
                         (0.046)          
                                          
locsize013              0.905***          
                         (0.055)          
                                          
locsize014              0.869***          
                         (0.046)          
                                          
log(scale)              8.859***          
                         (0.011)          
                                          
log(shape)               0.00004          
                         (0.111)          
                                          
------------------------------------------
Observations              7,552           
Log Likelihood         -17,274.650        
==========================================
Note:          *p<0.1; **p<0.05; ***p<0.01
extractAIC(model04)
[1]     7.00 34563.31

iii. Weibull Model (Covariates, schooling)

model05 <- phreg(formula = Surv(agecensor, died0121) ~ sex + schooling + locsize01, data = MHAS0121, dist = "weibull") 
model05
Call:
phreg(formula = Surv(agecensor, died0121) ~ sex + schooling + 
    locsize01, data = MHAS0121, dist = "weibull")

Covariate          W.mean      Coef Exp(Coef)  se(Coef)    Wald p
sex                 0.450     0.272     1.313     0.032     0.000 
schooling           4.175     0.014     1.014     0.004     0.001 
locsize01 
               1    0.572     0         1           (reference)
               2    0.154     0.026     1.027     0.046     0.562 
               3    0.099    -0.098     0.906     0.055     0.073 
               4    0.175    -0.143     0.866     0.046     0.002 

log(scale)                    4.490               0.003     0.000 
log(shape)                    2.233               0.012     0.000 

Events                    3983 
Total time at risk        591241 
Max. log. likelihood      -17146 
LR test statistic         102.11 
Degrees of freedom        5 
Overall p-value           0
stargazer::stargazer(model05, type="text",  out="models.txt",apply.coef = exp)

==========================================
                   Dependent variable:    
               ---------------------------
                        agecensor         
------------------------------------------
sex                     1.313***          
                         (0.032)          
                                          
schooling               1.014***          
                         (0.004)          
                                          
locsize012              1.027***          
                         (0.046)          
                                          
locsize013              0.906***          
                         (0.055)          
                                          
locsize014              0.866***          
                         (0.046)          
                                          
log(scale)              89.119***         
                         (0.003)          
                                          
log(shape)              9.332***          
                         (0.012)          
                                          
------------------------------------------
Observations              7,552           
Log Likelihood         -17,146.390        
==========================================
Note:          *p<0.1; **p<0.05; ***p<0.01
extractAIC(model05)
[1]     7.00 34306.79

iv. Log-Logistic Model (Covariates, schooling)

model06 <- phreg(formula = Surv(agecensor, died0121) ~ sex + schooling + locsize01, data = MHAS0121, dist = "loglogistic") 
model06
Call:
phreg(formula = Surv(agecensor, died0121) ~ sex + schooling + 
    locsize01, data = MHAS0121, dist = "loglogistic")

Covariate          W.mean      Coef Exp(Coef)  se(Coef)    Wald p
(Intercept)                   1.393               0.213     0.000 
sex                 0.450     0.272     1.313     0.032     0.000 
schooling           4.175     0.012     1.012     0.004     0.003 
locsize01 
               1    0.572     0         1           (reference)
               2    0.154     0.020     1.020     0.046     0.658 
               3    0.099    -0.096     0.908     0.055     0.078 
               4    0.175    -0.144     0.866     0.046     0.002 

log(scale)                    4.611               0.026     0.000 
log(shape)                    2.315               0.019     0.000 

Events                    3983 
Total time at risk        591241 
Max. log. likelihood      -17130 
LR test statistic         97.61 
Degrees of freedom        5 
Overall p-value           0
stargazer::stargazer(model06, type="text", out="models.txt", apply.coef = function(x) 1/exp(x))

==========================================
                   Dependent variable:    
               ---------------------------
                        agecensor         
------------------------------------------
sex                     0.761***          
                         (0.032)          
                                          
schooling               0.988***          
                         (0.004)          
                                          
locsize012              0.980***          
                         (0.046)          
                                          
locsize013              1.101***          
                         (0.055)          
                                          
locsize014              1.155***          
                         (0.046)          
                                          
log(scale)                0.010           
                         (0.026)          
                                          
log(shape)              0.099***          
                         (0.019)          
                                          
Constant                  0.248           
                         (0.213)          
                                          
------------------------------------------
Observations              7,552           
Log Likelihood         -17,129.600        
==========================================
Note:          *p<0.1; **p<0.05; ***p<0.01
extractAIC(model06)
[1]     8.00 34275.19

v. Log-Normal Model (Covariates, schooling)

model07 <- phreg(formula = Surv(agecensor, died0121) ~ sex + schooling + locsize01, data = MHAS0121, dist = "lognormal") 

model07
Call:
phreg(formula = Surv(agecensor, died0121) ~ sex + schooling + 
    locsize01, data = MHAS0121, dist = "lognormal")

Covariate          W.mean      Coef Exp(Coef)  se(Coef)    Wald p
(Intercept)                   4.902               0.724     0.000 
sex                 0.450     0.270     1.310     0.032     0.000 
schooling           4.175     0.011     1.011     0.004     0.007 
locsize01 
               1    0.572     0         1           (reference)
               2    0.154     0.020     1.020     0.046     0.664 
               3    0.099    -0.096     0.908     0.055     0.078 
               4    0.175    -0.144     0.866     0.046     0.002 

log(scale)                    5.276               0.141     0.000 
log(shape)                    1.127               0.071     0.000 

Events                    3983 
Total time at risk        591241 
Max. log. likelihood      -17110 
LR test statistic         94.08 
Degrees of freedom        5 
Overall p-value           0
stargazer::stargazer(model07, type="text", out="models.txt", apply.coef = function(x) 1/exp(x))

==========================================
                   Dependent variable:    
               ---------------------------
                        agecensor         
------------------------------------------
sex                     0.763***          
                         (0.032)          
                                          
schooling               0.989***          
                         (0.004)          
                                          
locsize012              0.980***          
                         (0.046)          
                                          
locsize013              1.101***          
                         (0.055)          
                                          
locsize014              1.155***          
                         (0.046)          
                                          
log(scale)                0.005           
                         (0.141)          
                                          
log(shape)              0.324***          
                         (0.071)          
                                          
Constant                  0.007           
                         (0.724)          
                                          
------------------------------------------
Observations              7,552           
Log Likelihood         -17,109.960        
==========================================
Note:          *p<0.1; **p<0.05; ***p<0.01
extractAIC(model07)
[1]     8.00 34235.92

II. Cox Regression, schooling

model08a <- coxph(formula = Surv(agecensor, died0121) ~ sex + schooling + locsize01, data = MHAS0121)

stargazer::stargazer(model08a, type="text",  out="models.txt",apply.coef = exp)

================================================
                         Dependent variable:    
                     ---------------------------
                              agecensor         
------------------------------------------------
sex                           1.310***          
                               (0.032)          
                                                
schooling                     1.011***          
                               (0.004)          
                                                
locsize012                    1.018***          
                               (0.046)          
                                                
locsize013                    0.910***          
                               (0.055)          
                                                
locsize014                    0.863***          
                               (0.046)          
                                                
------------------------------------------------
Observations                    7,552           
R2                              0.012           
Max. Possible R2                1.000           
Log Likelihood               -31,309.530        
Wald Test                94.060*** (df = 5)     
LR Test                  93.469*** (df = 5)     
Score (Logrank) Test     94.458*** (df = 5)     
================================================
Note:                *p<0.1; **p<0.05; ***p<0.01
extractAIC(model08a)
[1]     5.00 62629.06
BIC(model08a)
[1] 62660.51

II. Cox Regression, educlevel

model08a_0 <- coxph(formula = Surv(agecensor, died0121) ~ sex + educlevel + locsize01, data = MHAS0121)

stargazer::stargazer(model08a_0, type="text",  out="models.txt",apply.coef = exp)

================================================
                         Dependent variable:    
                     ---------------------------
                              agecensor         
------------------------------------------------
sex                           1.308***          
                               (0.032)          
                                                
educlevel15                   1.105***          
                               (0.039)          
                                                
educlevel68                   1.197***          
                               (0.048)          
                                                
educlevel919                  1.131***          
                               (0.056)          
                                                
locsize012                    1.018***          
                               (0.046)          
                                                
locsize013                    0.917***          
                               (0.055)          
                                                
locsize014                    0.873***          
                               (0.046)          
                                                
------------------------------------------------
Observations                    7,552           
R2                              0.013           
Max. Possible R2                1.000           
Log Likelihood               -31,305.240        
Wald Test                102.130*** (df = 7)    
LR Test                  102.050*** (df = 7)    
Score (Logrank) Test     102.564*** (df = 7)    
================================================
Note:                *p<0.1; **p<0.05; ***p<0.01
extractAIC(model08a_0)
[1]     7.00 62624.48
BIC(model08a_0)
[1] 62668.51

test.ph <- cox.zph(model08a)
test.ph
           chisq df      p
sex        8.677  1 0.0032
schooling  0.658  1 0.4172
locsize01  1.102  3 0.7767
GLOBAL    10.791  5 0.0557
ggcoxzph(test.ph)
Warning in regularize.values(x, y, ties, missing(ties), na.rm = na.rm):
collapsing to unique 'x' values

MHAS0121 %>%
survfit2(Surv(agecensor,died0121) ~schooling, data = .) %>%
  ggsurvfit(linetype_aes = TRUE, linewidth =0.5) +
  labs(
    x = "Age",
    y = "Cumulative survival probability"
    ) + 
  add_confidence_interval() +
add_risktable(
  risktable_stats =
    c("{n.risk} ({cum.event})",
      "{round(estimate*100)}% ({round(conf.low*100)}, {round(conf.high*100)})"),
  stats_label = c("At Risk (Cum. Events)", "Survival (95% CI)")
)+
  ggtitle("Figure 2. Kaplan-Meier Plot of Survival by Education")+
  add_quantile()

III. Prediction, Diagnostics, and Time-Varying Covariates

MHAS0121$educ9 <- as.factor(case_when(MHAS0121$schooling >= 9 ~ 1,
                            MHAS0121$schooling <9 ~ 0))
model08b <- coxph(formula = Surv(agecensor, died0121) ~ sex + strata(educ9) + locsize01 + schooling, data = MHAS0121)


stargazer::stargazer(model08b, type="text",  out="models.txt",apply.coef = exp)

================================================
                         Dependent variable:    
                     ---------------------------
                              agecensor         
------------------------------------------------
sex                           1.306***          
                               (0.032)          
                                                
locsize012                    1.015***          
                               (0.046)          
                                                
locsize013                    0.910***          
                               (0.055)          
                                                
locsize014                    0.866***          
                               (0.046)          
                                                
schooling                     1.021***          
                               (0.006)          
                                                
------------------------------------------------
Observations                    7,552           
R2                              0.012           
Max. Possible R2                1.000           
Log Likelihood               -29,866.750        
Wald Test                94.900*** (df = 5)     
LR Test                  94.481*** (df = 5)     
Score (Logrank) Test     95.293*** (df = 5)     
================================================
Note:                *p<0.1; **p<0.05; ***p<0.01
extractAIC(model08b)
[1]     5.00 59743.49
baseline_hazard <- basehaz(model08a)

# Plot the cumulative baseline hazard over time
plot(baseline_hazard$time, baseline_hazard$hazard, type = "l", 
     xlab = "Time", ylab = "Cumulative Baseline Hazard", 
     main = "Cumulative Baseline Hazard under Cox Model", col = "red")

model08c <- coxph(Surv(agecensor,died0121)~sex + locsize01, data = MHAS0121[MHAS0121$schooling == 0 & MHAS0121$sex == 0,] )

model08c
Call:
coxph(formula = Surv(agecensor, died0121) ~ sex + locsize01, 
    data = MHAS0121[MHAS0121$schooling == 0 & MHAS0121$sex == 
        0, ])

               coef exp(coef) se(coef)      z     p
sex              NA        NA  0.00000     NA    NA
locsize012 -0.05284   0.94853  0.10575 -0.500 0.617
locsize013 -0.15156   0.85937  0.12066 -1.256 0.209
locsize014 -0.06388   0.93812  0.08997 -0.710 0.478

Likelihood ratio test=1.77  on 3 df, p=0.6212
n= 1213, number of events= 747 
   (281 observations deleted due to missingness)
surv_fit <- survfit(model08c)


ggsurvplot(surv_fit, 
           data = MHAS0121, 
           conf.int = TRUE,                # Plot confidence intervals
           risk.table = TRUE,              # Add a risk table
           ggtheme = theme_minimal(),      # Use a minimal theme
           title = "Survival Curve: Women with 0 Years of Education and locsize01 = 'Level1'",
           xlab = "Time", 
            size = 0.25,
           censor.size =2,
           xlim=c(45,100),
           ylab = "Survival Probability")

stargazer::stargazer(model08c, type="text",  out="models.txt",apply.coef = exp)

================================================
                         Dependent variable:    
                     ---------------------------
                              agecensor         
------------------------------------------------
sex                                             
                               (0.000)          
                                                
locsize012                    0.949***          
                               (0.106)          
                                                
locsize013                    0.859***          
                               (0.121)          
                                                
locsize014                    0.938***          
                               (0.090)          
                                                
------------------------------------------------
Observations                    1,213           
R2                              0.001           
Max. Possible R2                0.999           
Log Likelihood               -4,517.250         
Wald Test                  1.740 (df = 3)       
LR Test                    1.771 (df = 3)       
Score (Logrank) Test       1.745 (df = 3)       
================================================
Note:                *p<0.1; **p<0.05; ***p<0.01
extractAIC(model08c)
[1]    3.0 9040.5
model08c <- coxph(Surv(agecensor,died0121)~sex + locsize01, data = MHAS0121[MHAS0121$schooling == 0 & MHAS0121$sex == 1,] )

model08c
Call:
coxph(formula = Surv(agecensor, died0121) ~ sex + locsize01, 
    data = MHAS0121[MHAS0121$schooling == 0 & MHAS0121$sex == 
        1, ])

               coef exp(coef) se(coef)      z     p
sex              NA        NA  0.00000     NA    NA
locsize012  0.01113   1.01119  0.13329  0.084 0.933
locsize013 -0.06899   0.93334  0.13396 -0.515 0.607
locsize014 -0.12170   0.88542  0.10429 -1.167 0.243

Likelihood ratio test=1.7  on 3 df, p=0.6359
n= 811, number of events= 528 
   (173 observations deleted due to missingness)
surv_fit <- survfit(model08c)


ggsurvplot(surv_fit, 
           data = MHAS0121, 
           conf.int = TRUE,                # Plot confidence intervals
           risk.table = TRUE,              # Add a risk table
           ggtheme = theme_minimal(),      # Use a minimal theme
           title = "Survival Curve: Men with 0 Years of Education and locsize01 = 'Level1'",
           xlab = "Time",
           xlim = c(45,100),
           size = 0.25,
           censor.size =2,
           ylab = "Survival Probability")

stargazer::stargazer(model08c, type="text",  out="models.txt",apply.coef = exp)

================================================
                         Dependent variable:    
                     ---------------------------
                              agecensor         
------------------------------------------------
sex                                             
                               (0.000)          
                                                
locsize012                    1.011***          
                               (0.133)          
                                                
locsize013                    0.933***          
                               (0.134)          
                                                
locsize014                    0.885***          
                               (0.104)          
                                                
------------------------------------------------
Observations                     811            
R2                              0.002           
Max. Possible R2                0.999           
Log Likelihood               -3,003.628         
Wald Test                  1.700 (df = 3)       
LR Test                    1.705 (df = 3)       
Score (Logrank) Test       1.697 (df = 3)       
================================================
Note:                *p<0.1; **p<0.05; ***p<0.01
extractAIC(model08c)
[1]    3.000 6013.257
model08c <- coxph(Surv(agecensor,died0121)~sex + schooling + locsize01, data = MHAS0121[MHAS0121$schooling == 6 & MHAS0121$sex == 0,] )

model08c
Call:
coxph(formula = Surv(agecensor, died0121) ~ sex + schooling + 
    locsize01, data = MHAS0121[MHAS0121$schooling == 6 & MHAS0121$sex == 
    0, ])

               coef exp(coef) se(coef)     z      p
sex              NA        NA 0.000000    NA     NA
schooling        NA        NA 0.000000    NA     NA
locsize012 0.266921  1.305937 0.157005 1.700 0.0891
locsize013 0.338700  1.403122 0.230728 1.468 0.1421
locsize014 0.006052  1.006070 0.260260 0.023 0.9814

Likelihood ratio test=4.3  on 3 df, p=0.2311
n= 683, number of events= 298 
   (196 observations deleted due to missingness)
surv_fit <- survfit(model08c)


ggsurvplot(surv_fit, 
           data = MHAS0121, 
           conf.int = TRUE,                # Plot confidence intervals
           risk.table = TRUE,              # Add a risk table
           ggtheme = theme_minimal(),      # Use a minimal theme
           title = "Survival Curve: Women with 6 Years of Education and locsize01 = 'Level1'",
           xlab = "Time", 
           xlim = c(45,100),
           ylab = "Survival Probability")

model08c <- coxph(Surv(agecensor,died0121)~sex + schooling + locsize01, data = MHAS0121[MHAS0121$schooling == 6 & MHAS0121$sex == 1,] )

model08c
Call:
coxph(formula = Surv(agecensor, died0121) ~ sex + schooling + 
    locsize01, data = MHAS0121[MHAS0121$schooling == 6 & MHAS0121$sex == 
    1, ])

               coef exp(coef) se(coef)      z      p
sex              NA        NA  0.00000     NA     NA
schooling        NA        NA  0.00000     NA     NA
locsize012 -0.23952   0.78700  0.15223 -1.573 0.1156
locsize013  0.06358   1.06565  0.22378  0.284 0.7763
locsize014 -0.38447   0.68081  0.18541 -2.074 0.0381

Likelihood ratio test=6.65  on 3 df, p=0.0838
n= 655, number of events= 334 
   (175 observations deleted due to missingness)
surv_fit <- survfit(model08c)


ggsurvplot(surv_fit, 
           data = MHAS0121, 
           conf.int = TRUE,                # Plot confidence intervals
           risk.table = TRUE,
           tables.theme = theme_cleantable(),
           axis.offset = TRUE,# Add a risk table
           ggtheme = theme_minimal(),      # Use a minimal theme
           title = "Survival Curve: Men with 6 Years of Education and locsize01 = 'Level1'",
           xlim = c(45,100),
           xlab = "Time", 
           ylab = "Survival Probability")

MHAS0121  %>%
  filter(sex == 0 & schooling == 0) %>%
survfit(Surv(agecensor,died0121) ~ sex + educ9 + locsize01, data = .) %>% 
  ggsurvfit(linetype_aes = FALSE, linewidth =0.5) +
  labs(
    x = "Age",
    y = "Cumulative survival probability"
    ) + 
  add_confidence_interval(type = "ribbon", alpha = 0.05) +
  theme_survminer()+
add_risktable(
  risktable_stats =
    c("{n.risk} ({cum.event})",
      "{round(estimate*100)}% ({round(conf.low*100)}, {round(conf.high*100)})"),
  stats_label = c("At Risk (Cum. Events)", "Survival (95% CI)")
)+
  ggtitle("")+
    scale_x_continuous(limits = c(45, 100))
Warning: Removed 25 rows containing missing values (`geom_step()`).

MHAS0121  %>%
  filter(sex == 1 & schooling == 0) %>%
survfit(Surv(agecensor,died0121) ~ sex + educ9 + locsize01, data = .) %>% 
  ggsurvfit(linetype_aes = FALSE, linewidth =0.5) +
  labs(
    x = "Age",
    y = "Cumulative survival probability"
    ) + 
  add_confidence_interval(type = "ribbon", alpha = 0.05) +
  theme_survminer()+
add_risktable(
  risktable_stats =
    c("{n.risk} ({cum.event})",
      "{round(estimate*100)}% ({round(conf.low*100)}, {round(conf.high*100)})"),
  stats_label = c("At Risk (Cum. Events)", "Survival (95% CI)")
)+
  ggtitle("")+
    scale_x_continuous(limits = c(45, 100))
Warning: Removed 17 rows containing missing values (`geom_step()`).

MHAS0121 <- as.data.frame(MHAS0121)

#::::::::::::::::::::::::::::::::::::::::::::::::::::::::
fit <- survfit( Surv(agecensor, died0121) ~ sex +locsize01+schooling, data = MHAS0121 )
ggsurvplot_facet(fit, MHAS0121[MHAS0121$sex==0,], facet.by = "educ9",
                palette = "jco", pval = TRUE, size = 0.25,xlim = c(45,100), censor.size = 2)
Warning: `as.tibble()` was deprecated in tibble 2.0.0.
ℹ Please use `as_tibble()` instead.
ℹ The signature and semantics have changed, see `?as_tibble`.
ℹ The deprecated feature was likely used in the survminer package.
  Please report the issue at <https://github.com/kassambara/survminer/issues>.
Warning: `select_()` was deprecated in dplyr 0.7.0.
ℹ Please use `select()` instead.
ℹ The deprecated feature was likely used in the survminer package.
  Please report the issue at <https://github.com/kassambara/survminer/issues>.
Warning: This manual palette can handle a maximum of 10 values. You have
supplied 66.
Warning: Removed 1611 rows containing missing values (`geom_step()`).
Warning: Removed 818 rows containing missing values (`geom_point()`).

#::::::::::::::::::::::::::::::::::::::::::::::::::::::::
fit <- survfit( Surv(agecensor, died0121) ~ sex +locsize01+schooling, data = MHAS0121 )
ggsurvplot_facet(fit, MHAS0121[MHAS0121$sex==1,], facet.by = "educ9",
                palette = "jco", pval = TRUE, size = 0.25,xlim = c(45,100), censor.size = 2)
Warning: This manual palette can handle a maximum of 10 values. You have
supplied 70.
Warning: Removed 1723 rows containing missing values (`geom_step()`).
Warning: Removed 830 rows containing missing values (`geom_point()`).

#::::::::::::::::::::::::::::::::::::::::::::::::::::::::
fit <- survfit( Surv(agecensor, died0121) ~ sex +locsize01+schooling, data = MHAS0121 )
ggsurvplot_facet(fit, MHAS0121[MHAS0121$sex==0,], facet.by = "locsize01",
                palette = "jco", pval = TRUE, size = 0.25,xlim = c(45,100), censor.size = 2)
Warning: This manual palette can handle a maximum of 10 values. You have
supplied 20.
Warning: Removed 266 rows containing missing values (`geom_step()`).
Warning: Removed 159 rows containing missing values (`geom_point()`).

#::::::::::::::::::::::::::::::::::::::::::::::::::::::::
fit <- survfit( Surv(agecensor, died0121) ~ sex +locsize01+schooling, data = MHAS0121 )
ggsurvplot_facet(fit, MHAS0121[MHAS0121$sex==1,], facet.by = "locsize01",
                palette = "jco", pval = TRUE, size = 0.25,xlim = c(45,100), censor.size = 2)
Warning: This manual palette can handle a maximum of 10 values. You have
supplied 20.
Warning: Removed 392 rows containing missing values (`geom_step()`).
Warning: Removed 221 rows containing missing values (`geom_point()`).

D. Multiple Models (Educlevel)

i. Exponential Model (Covariate, educlevel)

model03_0 <- phreg(formula = Surv(agecensor, died0121) ~ sex + educlevel + locsize01, data = MHAS0121, dist = "weibull", shape =1) 
model03_0
Call:
phreg(formula = Surv(agecensor, died0121) ~ sex + educlevel + 
    locsize01, data = MHAS0121, dist = "weibull", shape = 1)

Covariate          W.mean      Coef Exp(Coef)  se(Coef)    Wald p
sex                 0.450     0.131     1.140     0.032     0.000 
educlevel 
               0    0.276     0         1           (reference)
              15    0.356    -0.136     0.872     0.039     0.000 
              68    0.210    -0.287     0.751     0.047     0.000 
             919    0.158    -0.458     0.633     0.055     0.000 
locsize01 
               1    0.572     0         1           (reference)
               2    0.154    -0.024     0.976     0.046     0.599 
               3    0.099    -0.050     0.951     0.055     0.356 
               4    0.175    -0.140     0.869     0.046     0.002 

log(scale)                    4.857               0.036     0.000 

 Shape is fixed at  1 

Events                    3983 
Total time at risk        591241 
Max. log. likelihood      -23852 
LR test statistic         93.06 
Degrees of freedom        7 
Overall p-value           0
stargazer::stargazer(model03, type="text",  out="models.txt",apply.coef = exp)

==========================================
                   Dependent variable:    
               ---------------------------
                        agecensor         
------------------------------------------
sex                     1.144***          
                         (0.032)          
                                          
schooling               0.963***          
                         (0.004)          
                                          
locsize012              0.978***          
                         (0.045)          
                                          
locsize013              0.954***          
                         (0.055)          
                                          
locsize014              0.871***          
                         (0.045)          
                                          
log(scale)             132.128***         
                         (0.031)          
                                          
------------------------------------------
Observations              7,552           
Log Likelihood         -23,852.450        
==========================================
Note:          *p<0.1; **p<0.05; ***p<0.01
extractAIC(model03_0)
[1]     8.00 47720.44

ii. Gompertz Model (Covariate, educlevel)

model04_0 <- phreg(formula = Surv(agecensor, died0121) ~ sex + educlevel + locsize01, data = MHAS0121, dist = "gompertz") 
Warning in Hgompertz(exit, scale = escale, shape = 1, param = "canonical"):
Non-positive shape or scale
Warning in Hgompertz(enter, scale = escale, shape = 1, param = "canonical"):
Non-positive shape or scale
Warning in Hgompertz(exit, scale = escale, shape = 1, param = "canonical"):
Non-positive shape or scale
Warning in Hgompertz(enter, scale = escale, shape = 1, param = "canonical"):
Non-positive shape or scale
Warning in Hgompertz(exit, scale = escale, shape = 1, param = "canonical"):
Non-positive shape or scale
Warning in Hgompertz(enter, scale = escale, shape = 1, param = "canonical"):
Non-positive shape or scale
Warning in Hgompertz(exit, scale = escale, shape = 1, param = "canonical"):
Non-positive shape or scale
Warning in Hgompertz(enter, scale = escale, shape = 1, param = "canonical"):
Non-positive shape or scale
model04_0
Call:
phreg(formula = Surv(agecensor, died0121) ~ sex + educlevel + 
    locsize01, data = MHAS0121, dist = "gompertz")

Covariate          W.mean      Coef Exp(Coef)  se(Coef)    Wald p
sex                 0.450     0.271     1.311     0.032     0.000 
educlevel 
               0    0.276     0         1           (reference)
              15    0.356     0.157     1.170     0.039     0.000 
              68    0.210     0.257     1.293     0.048     0.000 
             919    0.158     0.207     1.229     0.056     0.000 
locsize01 
               1    0.572     0         1           (reference)
               2    0.154     0.036     1.036     0.046     0.435 
               3    0.099    -0.089     0.914     0.055     0.103 
               4    0.175    -0.124     0.883     0.046     0.007 

log(scale)                    2.177               0.011     0.000 
log(shape)                  -10.222               0.112     0.000 

Events                    3983 
Total time at risk        591241 
Max. log. likelihood      -17266 
LR test statistic         126.80 
Degrees of freedom        7 
Overall p-value           0
stargazer::stargazer(model04_0, type="text",  out="models.txt",apply.coef = exp)

==========================================
                   Dependent variable:    
               ---------------------------
                        agecensor         
------------------------------------------
sex                     1.311***          
                         (0.032)          
                                          
educlevel15             1.170***          
                         (0.039)          
                                          
educlevel68             1.293***          
                         (0.048)          
                                          
educlevel919            1.229***          
                         (0.056)          
                                          
locsize012              1.036***          
                         (0.046)          
                                          
locsize013              0.914***          
                         (0.055)          
                                          
locsize014              0.883***          
                         (0.046)          
                                          
log(scale)              8.818***          
                         (0.011)          
                                          
log(shape)               0.00004          
                         (0.112)          
                                          
------------------------------------------
Observations              7,552           
Log Likelihood         -17,266.100        
==========================================
Note:          *p<0.1; **p<0.05; ***p<0.01
extractAIC(model04_0)
[1]     9.0 34550.2

iii. Weibull Model (Covariate, educlevel)

model05_0 <- phreg(formula = Surv(agecensor, died0121) ~ sex + educlevel + locsize01, data = MHAS0121, dist = "weibull") 
model05_0
Call:
phreg(formula = Surv(agecensor, died0121) ~ sex + educlevel + 
    locsize01, data = MHAS0121, dist = "weibull")

Covariate          W.mean      Coef Exp(Coef)  se(Coef)    Wald p
sex                 0.450     0.271     1.311     0.032     0.000 
educlevel 
               0    0.276     0         1           (reference)
              15    0.356     0.124     1.132     0.039     0.001 
              68    0.210     0.215     1.240     0.048     0.000 
             919    0.158     0.165     1.180     0.056     0.003 
locsize01 
               1    0.572     0         1           (reference)
               2    0.154     0.026     1.026     0.046     0.568 
               3    0.099    -0.090     0.914     0.055     0.102 
               4    0.175    -0.130     0.878     0.046     0.005 

log(scale)                    4.496               0.004     0.000 
log(shape)                    2.237               0.012     0.000 

Events                    3983 
Total time at risk        591241 
Max. log. likelihood      -17141 
LR test statistic         113.71 
Degrees of freedom        7 
Overall p-value           0
stargazer::stargazer(model05_0, type="text",  out="models.txt",apply.coef = exp)

==========================================
                   Dependent variable:    
               ---------------------------
                        agecensor         
------------------------------------------
sex                     1.311***          
                         (0.032)          
                                          
educlevel15             1.132***          
                         (0.039)          
                                          
educlevel68             1.240***          
                         (0.048)          
                                          
educlevel919            1.180***          
                         (0.056)          
                                          
locsize012              1.026***          
                         (0.046)          
                                          
locsize013              0.914***          
                         (0.055)          
                                          
locsize014              0.878***          
                         (0.046)          
                                          
log(scale)              89.653***         
                         (0.004)          
                                          
log(shape)              9.362***          
                         (0.012)          
                                          
------------------------------------------
Observations              7,552           
Log Likelihood         -17,140.590        
==========================================
Note:          *p<0.1; **p<0.05; ***p<0.01
extractAIC(model05_0)
[1]     9.00 34299.19

iv. Log-Logistic Model (Covariate, educlevel)

model05_0 <- phreg(formula = Surv(agecensor, died0121) ~ sex + educlevel + locsize01, data = MHAS0121, dist = "loglogistic") 
model05_0
Call:
phreg(formula = Surv(agecensor, died0121) ~ sex + educlevel + 
    locsize01, data = MHAS0121, dist = "loglogistic")

Covariate          W.mean      Coef Exp(Coef)  se(Coef)    Wald p
(Intercept)                   1.396               0.221     0.000 
sex                 0.450     0.271     1.312     0.032     0.000 
educlevel 
               0    0.276     0         1           (reference)
              15    0.356     0.103     1.109     0.039     0.008 
              68    0.210     0.189     1.208     0.048     0.000 
             919    0.158     0.142     1.152     0.056     0.012 
locsize01 
               1    0.572     0         1           (reference)
               2    0.154     0.020     1.020     0.046     0.658 
               3    0.099    -0.089     0.915     0.055     0.104 
               4    0.175    -0.133     0.876     0.046     0.004 

log(scale)                    4.617               0.027     0.000 
log(shape)                    2.314               0.019     0.000 

Events                    3983 
Total time at risk        591241 
Max. log. likelihood      -17125 
LR test statistic         106.21 
Degrees of freedom        7 
Overall p-value           0
stargazer::stargazer(model05_0, type="text",  out="models.txt",apply.coef = function(x) 1/exp(x))

==========================================
                   Dependent variable:    
               ---------------------------
                        agecensor         
------------------------------------------
sex                     0.762***          
                         (0.032)          
                                          
educlevel15             0.902***          
                         (0.039)          
                                          
educlevel68             0.828***          
                         (0.048)          
                                          
educlevel919            0.868***          
                         (0.056)          
                                          
locsize012              0.980***          
                         (0.046)          
                                          
locsize013              1.093***          
                         (0.055)          
                                          
locsize014              1.142***          
                         (0.046)          
                                          
log(scale)                0.010           
                         (0.027)          
                                          
log(shape)              0.099***          
                         (0.019)          
                                          
Constant                  0.248           
                         (0.221)          
                                          
------------------------------------------
Observations              7,552           
Log Likelihood         -17,125.290        
==========================================
Note:          *p<0.1; **p<0.05; ***p<0.01
extractAIC(model05_0)
[1]    10.00 34270.59

v. Log-Normal Model (Covariate, educlevel)

model06_0 <- phreg(formula = Surv(agecensor, died0121) ~ sex + educlevel + locsize01, data = MHAS0121, dist = "lognormal") 
model06_0
Call:
phreg(formula = Surv(agecensor, died0121) ~ sex + educlevel + 
    locsize01, data = MHAS0121, dist = "lognormal")

Covariate          W.mean      Coef Exp(Coef)  se(Coef)    Wald p
(Intercept)                   5.004               0.754     0.000 
sex                 0.450     0.269     1.308     0.032     0.000 
educlevel 
               0    0.276     0         1           (reference)
              15    0.356     0.098     1.103     0.039     0.011 
              68    0.210     0.178     1.195     0.048     0.000 
             919    0.158     0.128     1.137     0.056     0.023 
locsize01 
               1    0.572     0         1           (reference)
               2    0.154     0.020     1.020     0.046     0.666 
               3    0.099    -0.090     0.914     0.055     0.103 
               4    0.175    -0.133     0.875     0.046     0.004 

log(scale)                    5.303               0.147     0.000 
log(shape)                    1.114               0.072     0.000 

Events                    3983 
Total time at risk        591241 
Max. log. likelihood      -17106 
LR test statistic         102.12 
Degrees of freedom        7 
Overall p-value           0
stargazer::stargazer(model06_0, type="text",  out="models.txt",apply.coef = function(x) 1/exp(x))

==========================================
                   Dependent variable:    
               ---------------------------
                        agecensor         
------------------------------------------
sex                     0.764***          
                         (0.032)          
                                          
educlevel15             0.906***          
                         (0.039)          
                                          
educlevel68             0.837***          
                         (0.048)          
                                          
educlevel919            0.879***          
                         (0.056)          
                                          
locsize012              0.980***          
                         (0.046)          
                                          
locsize013              1.094***          
                         (0.055)          
                                          
locsize014              1.142***          
                         (0.046)          
                                          
log(scale)                0.005           
                         (0.147)          
                                          
log(shape)              0.328***          
                         (0.072)          
                                          
Constant                  0.007           
                         (0.754)          
                                          
------------------------------------------
Observations              7,552           
Log Likelihood         -17,105.940        
==========================================
Note:          *p<0.1; **p<0.05; ***p<0.01
extractAIC(model06_0)
[1]    10.00 34231.87