DEM7223 Event History Analysis Fall 2024 Problem Set 2 Univariate Event History Analysis

Author

Jacob M. Souch

Import Data

library(survival)
library(tidyverse)

Warning: package 'ggplot2' was built under R version 4.3.2

Warning: package 'readr' was built under R version 4.3.2

Warning: package 'purrr' was built under R version 4.3.2

Warning: package 'dplyr' was built under R version 4.3.2

── Attaching core tidyverse packages ──────────────────────── tidyverse 2.0.0 ──
✔ dplyr     1.1.3     ✔ readr     2.1.4
✔ forcats   1.0.0     ✔ stringr   1.5.0
✔ ggplot2   3.4.4     ✔ tibble    3.2.1
✔ lubridate 1.9.2     ✔ tidyr     1.3.0
✔ purrr     1.0.2     
── Conflicts ────────────────────────────────────────── tidyverse_conflicts() ──
✖ dplyr::filter() masks stats::filter()
✖ dplyr::lag()    masks stats::lag()
ℹ Use the conflicted package (<http://conflicted.r-lib.org/>) to force all conflicts to become errors

library(survminer)

Warning: package 'survminer' was built under R version 4.3.2

Loading required package: ggpubr

Warning: package 'ggpubr' was built under R version 4.3.2


Attaching package: 'survminer'

The following object is masked from 'package:survival':

    myeloma

library(ggsurvfit)
library(eha)

Warning: package 'eha' was built under R version 4.3.3

library(survival)

library(readr)
MHAS0121 <- read_csv("C:/Users/jacob/Downloads/MHAS0121_NOLABELS.csv")

Rows: 9636 Columns: 39
── Column specification ────────────────────────────────────────────────────────
Delimiter: ","
dbl (39): id, locsize01, perwght01, died18, died21, mobirth, yrbirth, female...

ℹ Use `spec()` to retrieve the full column specification for this data.
ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.

fit <- with(MHAS0121, Surv(agecensor,died0121))

ID – Unique person identifier

FEMALE – dummy variable for whether the sampled individual identifies as a woman or a man (no other options; note that in some in-person questionnaires, this is based on external classification by interviewers as opposed to self-reported by interviewees).

SCHOOLING – Years of formal schooling.

EDUCLEVEL – Schooling variable coding in intervals, including: no formal schooling (0), 1 – 5 years (15), 6-8 years (68), and 9 or more years (because it is topped at 19 years, value is 919).

LOCSIZE01 – Size of the locality of residence at baseline in 2001, ranging from 100,000+ (1), 15,000 – 99,999 (2), 2,500 – 14,999 (3), and less than 2,500 (4). In Mexico, urban areas are defined as those having at least 2,500 inhabitants (one of the lowest bars for urban I have ever encountered).

MOBIRTH – Month of birth

YRBIRTH – Year of birth

MOINT – Month of baseline interview (in 2001)

YRINT – Year of baseline interview (i.e., 2001)

DIED0121 – Dummy variable for whether person passed away in the inter-wave period (otherwise,censored at re-interview, lost to follow-up, or refused interview, all coded as 0)

MOCENSOR – Month in which person was ‘observed’ for the last time (whether because s/he passed

away, was re-interviewed, rejected the survey, or when we assume was lost to follow-up). YRCENSOR – Year in which person was ‘observed’ for the last time (see MOCENSOR). We will also create other variables in Stata that we will need for the exercise (see accompanying Stata do file):

AGEINT – Age at which person was interviewed at baseline

AGECENSOR – Age the interviewee was when we observed them last.

(1) the variable that measures the time each person was exposed to the risk of death before dying/being censored for other reasons.

Agecensor is the variabel that measures the time each person was exposed to the risk of death before dying or censoring.

(2) the variable that indicates whether the event occurred or not (i.e., whether the person died in the interval); Died0121 is the variable that indicates whether or not th person has died.

(3)the time in which the person entered and exited the risk set. After doing this (or, without it, if your package does not require it):

Data Preparation

MHAS0121$sex <- abs(as.numeric(MHAS0121$female) - 1)
MHAS0121$locsize01 <- as.factor(MHAS0121$locsize01)
MHAS0121$educlevel <- as.factor(MHAS0121$educlevel)

Models

I. Multivariate parametric proportional hazards models

A. Intercept-Only Exponential Model

Estimate parametric proportional hazards model with an exponential (i.e., constant) with no covariates. Instead of hazard ratios, obtain coefficients. Exponentiate the intercept of the model. (Later on, we will compare this to the crude death rate for people 50+ we will calculate in a later problem set).

model01 <- phreg(formula = Surv(agecensor, died0121) ~ 1, data = MHAS0121, dist = "weibull", shape =1) 
model01

Call:
phreg(formula = Surv(agecensor, died0121) ~ 1, data = MHAS0121, 
    dist = "weibull", shape = 1)

Covariate          W.mean      Coef Exp(Coef)  se(Coef)    Wald p
log(scale)                    4.999               0.016     0.000 

 Shape is fixed at  1 

Events                    3994 
Total time at risk        592255 
Max. log. likelihood      -23961

stargazer::stargazer(model01, type="text",  out="models.txt",apply.coef = exp)


==========================================
                   Dependent variable:    
               ---------------------------
                        agecensor         
------------------------------------------
log(scale)             148.286***         
                         (0.016)          
                                          
------------------------------------------
Observations              7,565           
Log Likelihood         -23,960.580        
==========================================
Note:          *p<0.1; **p<0.05; ***p<0.01

extractAIC(model01)

[1]     1.00 47923.16

B. Female Covariate Exponential Model

model02 <- phreg(formula = Surv(agecensor, died0121) ~ factor(sex), data = MHAS0121, dist = "weibull", shape =1) 
model02

Call:
phreg(formula = Surv(agecensor, died0121) ~ factor(sex), data = MHAS0121, 
    dist = "weibull", shape = 1)

Covariate          W.mean      Coef Exp(Coef)  se(Coef)    Wald p
factor(sex) 
               0    0.549     0         1           (reference)
               1    0.451     0.099     1.104     0.032     0.002 

log(scale)                    5.045               0.022     0.000 

 Shape is fixed at  1 

Events                    3994 
Total time at risk        592255 
Max. log. likelihood      -23956 
LR test statistic         9.74 
Degrees of freedom        1 
Overall p-value           0.00179989

stargazer::stargazer(model02, type="text",  out="models.txt",apply.coef = exp)


==========================================
                   Dependent variable:    
               ---------------------------
                        agecensor         
------------------------------------------
factor(sex)1            1.104***          
                         (0.032)          
                                          
log(scale)             155.240***         
                         (0.022)          
                                          
------------------------------------------
Observations              7,565           
Log Likelihood         -23,955.710        
==========================================
Note:          *p<0.1; **p<0.05; ***p<0.01

extractAIC(model02)

[1]     2.00 47915.42

Estimate another PH parametric model controlling for FEMALE, SCHOOLING (or
categories of EDUCLEVEL, whatever you prefer, justifying your choice), and LOCSIZE01. Assuming the baseline hazard is distributed:

C. Multiple Models

i. Exponential Model (Covariates)

model03 <- phreg(formula = Surv(agecensor, died0121) ~ sex + educlevel + locsize01, data = MHAS0121, dist = "weibull", shape =1) 
model03

Call:
phreg(formula = Surv(agecensor, died0121) ~ sex + educlevel + 
    locsize01, data = MHAS0121, dist = "weibull", shape = 1)

Covariate          W.mean      Coef Exp(Coef)  se(Coef)    Wald p
sex                 0.450     0.131     1.140     0.032     0.000 
educlevel 
               0    0.276     0         1           (reference)
              15    0.356    -0.136     0.872     0.039     0.000 
              68    0.210    -0.287     0.751     0.047     0.000 
             919    0.158    -0.458     0.633     0.055     0.000 
locsize01 
               1    0.572     0         1           (reference)
               2    0.154    -0.024     0.976     0.046     0.599 
               3    0.099    -0.050     0.951     0.055     0.356 
               4    0.175    -0.140     0.869     0.046     0.002 

log(scale)                    4.857               0.036     0.000 

 Shape is fixed at  1 

Events                    3983 
Total time at risk        591241 
Max. log. likelihood      -23852 
LR test statistic         93.06 
Degrees of freedom        7 
Overall p-value           0

stargazer::stargazer(model03, type="text",  out="models.txt",apply.coef = exp)


==========================================
                   Dependent variable:    
               ---------------------------
                        agecensor         
------------------------------------------
sex                     1.140***          
                         (0.032)          
                                          
educlevel15             0.872***          
                         (0.039)          
                                          
educlevel68             0.751***          
                         (0.047)          
                                          
educlevel919            0.633***          
                         (0.055)          
                                          
locsize012              0.976***          
                         (0.046)          
                                          
locsize013              0.951***          
                         (0.055)          
                                          
locsize014              0.869***          
                         (0.046)          
                                          
log(scale)             128.614***         
                         (0.036)          
                                          
------------------------------------------
Observations              7,552           
Log Likelihood         -23,852.220        
==========================================
Note:          *p<0.1; **p<0.05; ***p<0.01

extractAIC(model03)

[1]     8.00 47720.44

ii. Gompertz Model (Covariates)

model04 <- phreg(formula = Surv(agecensor, died0121) ~ sex + educlevel + locsize01, data = MHAS0121, dist = "gompertz")

Warning in Hgompertz(exit, scale = escale, shape = 1, param = "canonical"):
Non-positive shape or scale

Warning in Hgompertz(enter, scale = escale, shape = 1, param = "canonical"):
Non-positive shape or scale

Warning in Hgompertz(exit, scale = escale, shape = 1, param = "canonical"):
Non-positive shape or scale

Warning in Hgompertz(enter, scale = escale, shape = 1, param = "canonical"):
Non-positive shape or scale

Warning in Hgompertz(exit, scale = escale, shape = 1, param = "canonical"):
Non-positive shape or scale

Warning in Hgompertz(enter, scale = escale, shape = 1, param = "canonical"):
Non-positive shape or scale

Warning in Hgompertz(exit, scale = escale, shape = 1, param = "canonical"):
Non-positive shape or scale

Warning in Hgompertz(enter, scale = escale, shape = 1, param = "canonical"):
Non-positive shape or scale

model04

Call:
phreg(formula = Surv(agecensor, died0121) ~ sex + educlevel + 
    locsize01, data = MHAS0121, dist = "gompertz")

Covariate          W.mean      Coef Exp(Coef)  se(Coef)    Wald p
sex                 0.450     0.271     1.311     0.032     0.000 
educlevel 
               0    0.276     0         1           (reference)
              15    0.356     0.157     1.170     0.039     0.000 
              68    0.210     0.257     1.293     0.048     0.000 
             919    0.158     0.207     1.229     0.056     0.000 
locsize01 
               1    0.572     0         1           (reference)
               2    0.154     0.036     1.036     0.046     0.435 
               3    0.099    -0.089     0.914     0.055     0.103 
               4    0.175    -0.124     0.883     0.046     0.007 

log(scale)                    2.177               0.011     0.000 
log(shape)                  -10.222               0.112     0.000 

Events                    3983 
Total time at risk        591241 
Max. log. likelihood      -17266 
LR test statistic         126.80 
Degrees of freedom        7 
Overall p-value           0

stargazer::stargazer(model04, type="text",  out="models.txt",apply.coef = exp)


==========================================
                   Dependent variable:    
               ---------------------------
                        agecensor         
------------------------------------------
sex                     1.311***          
                         (0.032)          
                                          
educlevel15             1.170***          
                         (0.039)          
                                          
educlevel68             1.293***          
                         (0.048)          
                                          
educlevel919            1.229***          
                         (0.056)          
                                          
locsize012              1.036***          
                         (0.046)          
                                          
locsize013              0.914***          
                         (0.055)          
                                          
locsize014              0.883***          
                         (0.046)          
                                          
log(scale)              8.818***          
                         (0.011)          
                                          
log(shape)               0.00004          
                         (0.112)          
                                          
------------------------------------------
Observations              7,552           
Log Likelihood         -17,266.100        
==========================================
Note:          *p<0.1; **p<0.05; ***p<0.01

extractAIC(model04)

[1]     9.0 34550.2

iii. Weibull Model (Covariates)

model05 <- phreg(formula = Surv(agecensor, died0121) ~ sex + educlevel + locsize01, data = MHAS0121, dist = "weibull") 
model05

Call:
phreg(formula = Surv(agecensor, died0121) ~ sex + educlevel + 
    locsize01, data = MHAS0121, dist = "weibull")

Covariate          W.mean      Coef Exp(Coef)  se(Coef)    Wald p
sex                 0.450     0.271     1.311     0.032     0.000 
educlevel 
               0    0.276     0         1           (reference)
              15    0.356     0.124     1.132     0.039     0.001 
              68    0.210     0.215     1.240     0.048     0.000 
             919    0.158     0.165     1.180     0.056     0.003 
locsize01 
               1    0.572     0         1           (reference)
               2    0.154     0.026     1.026     0.046     0.568 
               3    0.099    -0.090     0.914     0.055     0.102 
               4    0.175    -0.130     0.878     0.046     0.005 

log(scale)                    4.496               0.004     0.000 
log(shape)                    2.237               0.012     0.000 

Events                    3983 
Total time at risk        591241 
Max. log. likelihood      -17141 
LR test statistic         113.71 
Degrees of freedom        7 
Overall p-value           0

stargazer::stargazer(model05, type="text",  out="models.txt",apply.coef = exp)


==========================================
                   Dependent variable:    
               ---------------------------
                        agecensor         
------------------------------------------
sex                     1.311***          
                         (0.032)          
                                          
educlevel15             1.132***          
                         (0.039)          
                                          
educlevel68             1.240***          
                         (0.048)          
                                          
educlevel919            1.180***          
                         (0.056)          
                                          
locsize012              1.026***          
                         (0.046)          
                                          
locsize013              0.914***          
                         (0.055)          
                                          
locsize014              0.878***          
                         (0.046)          
                                          
log(scale)              89.653***         
                         (0.004)          
                                          
log(shape)              9.362***          
                         (0.012)          
                                          
------------------------------------------
Observations              7,552           
Log Likelihood         -17,140.590        
==========================================
Note:          *p<0.1; **p<0.05; ***p<0.01

extractAIC(model05)

[1]     9.00 34299.19

iv. Log-Logistic Model (Covariates)

model06 <- phreg(formula = Surv(agecensor, died0121) ~ sex + educlevel + locsize01, data = MHAS0121, dist = "loglogistic") 
model06

Call:
phreg(formula = Surv(agecensor, died0121) ~ sex + educlevel + 
    locsize01, data = MHAS0121, dist = "loglogistic")

Covariate          W.mean      Coef Exp(Coef)  se(Coef)    Wald p
(Intercept)                   1.396               0.221     0.000 
sex                 0.450     0.271     1.312     0.032     0.000 
educlevel 
               0    0.276     0         1           (reference)
              15    0.356     0.103     1.109     0.039     0.008 
              68    0.210     0.189     1.208     0.048     0.000 
             919    0.158     0.142     1.152     0.056     0.012 
locsize01 
               1    0.572     0         1           (reference)
               2    0.154     0.020     1.020     0.046     0.658 
               3    0.099    -0.089     0.915     0.055     0.104 
               4    0.175    -0.133     0.876     0.046     0.004 

log(scale)                    4.617               0.027     0.000 
log(shape)                    2.314               0.019     0.000 

Events                    3983 
Total time at risk        591241 
Max. log. likelihood      -17125 
LR test statistic         106.21 
Degrees of freedom        7 
Overall p-value           0

stargazer::stargazer(model06, type="text",  out="models.txt",apply.coef = exp)


==========================================
                   Dependent variable:    
               ---------------------------
                        agecensor         
------------------------------------------
sex                     1.312***          
                         (0.032)          
                                          
educlevel15             1.109***          
                         (0.039)          
                                          
educlevel68             1.208***          
                         (0.048)          
                                          
educlevel919            1.152***          
                         (0.056)          
                                          
locsize012              1.020***          
                         (0.046)          
                                          
locsize013              0.915***          
                         (0.055)          
                                          
locsize014              0.876***          
                         (0.046)          
                                          
log(scale)             101.165***         
                         (0.027)          
                                          
log(shape)              10.114***         
                         (0.019)          
                                          
Constant                4.038***          
                         (0.221)          
                                          
------------------------------------------
Observations              7,552           
Log Likelihood         -17,125.290        
==========================================
Note:          *p<0.1; **p<0.05; ***p<0.01

extractAIC(model06)

[1]    10.00 34270.59

v. Log-Normal Model (Covariates)

model07 <- phreg(formula = Surv(agecensor, died0121) ~ sex + educlevel + locsize01, data = MHAS0121, dist = "lognormal") 

model07

Call:
phreg(formula = Surv(agecensor, died0121) ~ sex + educlevel + 
    locsize01, data = MHAS0121, dist = "lognormal")

Covariate          W.mean      Coef Exp(Coef)  se(Coef)    Wald p
(Intercept)                   5.004               0.754     0.000 
sex                 0.450     0.269     1.308     0.032     0.000 
educlevel 
               0    0.276     0         1           (reference)
              15    0.356     0.098     1.103     0.039     0.011 
              68    0.210     0.178     1.195     0.048     0.000 
             919    0.158     0.128     1.137     0.056     0.023 
locsize01 
               1    0.572     0         1           (reference)
               2    0.154     0.020     1.020     0.046     0.666 
               3    0.099    -0.090     0.914     0.055     0.103 
               4    0.175    -0.133     0.875     0.046     0.004 

log(scale)                    5.303               0.147     0.000 
log(shape)                    1.114               0.072     0.000 

Events                    3983 
Total time at risk        591241 
Max. log. likelihood      -17106 
LR test statistic         102.12 
Degrees of freedom        7 
Overall p-value           0

stargazer::stargazer(model07, type="text",  out="models.txt",apply.coef = exp)


==========================================
                   Dependent variable:    
               ---------------------------
                        agecensor         
------------------------------------------
sex                     1.308***          
                         (0.032)          
                                          
educlevel15             1.103***          
                         (0.039)          
                                          
educlevel68             1.195***          
                         (0.048)          
                                          
educlevel919            1.137***          
                         (0.056)          
                                          
locsize012              1.020***          
                         (0.046)          
                                          
locsize013              0.914***          
                         (0.055)          
                                          
locsize014              0.875***          
                         (0.046)          
                                          
log(scale)             201.015***         
                         (0.147)          
                                          
log(shape)              3.048***          
                         (0.072)          
                                          
Constant               149.011***         
                         (0.754)          
                                          
------------------------------------------
Observations              7,552           
Log Likelihood         -17,105.940        
==========================================
Note:          *p<0.1; **p<0.05; ***p<0.01

extractAIC(model07)

[1]    10.00 34231.87

II. Cox Regression

model08a <- coxph(formula = Surv(agecensor, died0121) ~ sex + schooling + locsize01, data = MHAS0121)

stargazer::stargazer(model08a, type="text",  out="models.txt",apply.coef = exp)


================================================
                         Dependent variable:    
                     ---------------------------
                              agecensor         
------------------------------------------------
sex                           1.310***          
                               (0.032)          
                                                
schooling                     1.011***          
                               (0.004)          
                                                
locsize012                    1.018***          
                               (0.046)          
                                                
locsize013                    0.910***          
                               (0.055)          
                                                
locsize014                    0.863***          
                               (0.046)          
                                                
------------------------------------------------
Observations                    7,552           
R2                              0.012           
Max. Possible R2                1.000           
Log Likelihood               -31,309.530        
Wald Test                94.060*** (df = 5)     
LR Test                  93.469*** (df = 5)     
Score (Logrank) Test     94.458*** (df = 5)     
================================================
Note:                *p<0.1; **p<0.05; ***p<0.01

extractAIC(model08a)

[1]     5.00 62629.06

model08a <- coxph(formula = Surv(agecensor, died0121) ~ sex + educlevel + locsize01, data = MHAS0121)

stargazer::stargazer(model08a, type="text",  out="models.txt",apply.coef = exp)


================================================
                         Dependent variable:    
                     ---------------------------
                              agecensor         
------------------------------------------------
sex                           1.308***          
                               (0.032)          
                                                
educlevel15                   1.105***          
                               (0.039)          
                                                
educlevel68                   1.197***          
                               (0.048)          
                                                
educlevel919                  1.131***          
                               (0.056)          
                                                
locsize012                    1.018***          
                               (0.046)          
                                                
locsize013                    0.917***          
                               (0.055)          
                                                
locsize014                    0.873***          
                               (0.046)          
                                                
------------------------------------------------
Observations                    7,552           
R2                              0.013           
Max. Possible R2                1.000           
Log Likelihood               -31,305.240        
Wald Test                102.130*** (df = 7)    
LR Test                  102.050*** (df = 7)    
Score (Logrank) Test     102.564*** (df = 7)    
================================================
Note:                *p<0.1; **p<0.05; ***p<0.01

extractAIC(model08a)

[1]     7.00 62624.48

test.ph <- cox.zph(model08a)
test.ph

          chisq df      p
sex        8.57  1 0.0034
educlevel  2.53  3 0.4701
locsize01  1.45  3 0.6947
GLOBAL    12.79  7 0.0774

ggcoxzph(test.ph)

Warning in regularize.values(x, y, ties, missing(ties), na.rm = na.rm):
collapsing to unique 'x' values

MHAS0121 %>%
survfit2(Surv(agecensor,died0121) ~educlevel, data = .) %>%
  ggsurvfit(linetype_aes = TRUE, linewidth =0.5) +
  labs(
    x = "Age",
    y = "Cumulative survival probability"
    ) + 
  add_confidence_interval() +
add_risktable(
  risktable_stats =
    c("{n.risk} ({cum.event})",
      "{round(estimate*100)}% ({round(conf.low*100)}, {round(conf.high*100)})"),
  stats_label = c("At Risk (Cum. Events)", "Survival (95% CI)")
)+
  ggtitle("Figure 2. Kaplan-Meier Plot of Survival by Education")+
  add_quantile()

III. Prediction, Diagnostics, and Time-Varying Covariates

MHAS0121$educ9 <- as.factor(case_when(MHAS0121$schooling >= 9 ~ 1,
                            MHAS0121$schooling <9 ~ 0))

model08b <- coxph(formula = Surv(agecensor, died0121) ~ sex + strata(educ9) + locsize01, data = MHAS0121)


stargazer::stargazer(model08b, type="text",  out="models.txt",apply.coef = exp)


================================================
                         Dependent variable:    
                     ---------------------------
                              agecensor         
------------------------------------------------
sex                           1.316***          
                               (0.032)          
                                                
locsize012                    1.005***          
                               (0.046)          
                                                
locsize013                    0.890***          
                               (0.054)          
                                                
locsize014                    0.840***          
                               (0.045)          
                                                
------------------------------------------------
Observations                    7,552           
R2                              0.011           
Max. Possible R2                1.000           
Log Likelihood               -29,871.940        
Wald Test                84.400*** (df = 4)     
LR Test                  84.099*** (df = 4)     
Score (Logrank) Test     84.759*** (df = 4)     
================================================
Note:                *p<0.1; **p<0.05; ***p<0.01

extractAIC(model08b)

[1]     4.00 59751.88

baseline_hazard <- basehaz(model08a)

# Plot the cumulative baseline hazard over time
plot(baseline_hazard$time, baseline_hazard$hazard, type = "l", 
     xlab = "Time", ylab = "Cumulative Baseline Hazard", 
     main = "Cumulative Baseline Hazard under Cox Model", col = "red")

model08c <- coxph(Surv(agecensor,died0121)~sex + locsize01, data = MHAS0121[MHAS0121$schooling == 0 & MHAS0121$sex == 0,] )

model08c

Call:
coxph(formula = Surv(agecensor, died0121) ~ sex + locsize01, 
    data = MHAS0121[MHAS0121$schooling == 0 & MHAS0121$sex == 
        0, ])

               coef exp(coef) se(coef)      z     p
sex              NA        NA  0.00000     NA    NA
locsize012 -0.05284   0.94853  0.10575 -0.500 0.617
locsize013 -0.15156   0.85937  0.12066 -1.256 0.209
locsize014 -0.06388   0.93812  0.08997 -0.710 0.478

Likelihood ratio test=1.77  on 3 df, p=0.6212
n= 1213, number of events= 747 
   (281 observations deleted due to missingness)

surv_fit <- survfit(model08c)


ggsurvplot(surv_fit, 
           data = MHAS0121, 
           conf.int = TRUE,                # Plot confidence intervals
           risk.table = TRUE,              # Add a risk table
           ggtheme = theme_minimal(),      # Use a minimal theme
           title = "Survival Curve: Women with 0 Years of Education and locsize01 = 'Level1'",
           xlab = "Time", 
            size = 0.25,
           censor.size =2,
           xlim=c(45,100),
           ylab = "Survival Probability")

stargazer::stargazer(model08c, type="text",  out="models.txt",apply.coef = exp)


================================================
                         Dependent variable:    
                     ---------------------------
                              agecensor         
------------------------------------------------
sex                                             
                               (0.000)          
                                                
locsize012                    0.949***          
                               (0.106)          
                                                
locsize013                    0.859***          
                               (0.121)          
                                                
locsize014                    0.938***          
                               (0.090)          
                                                
------------------------------------------------
Observations                    1,213           
R2                              0.001           
Max. Possible R2                0.999           
Log Likelihood               -4,517.250         
Wald Test                  1.740 (df = 3)       
LR Test                    1.771 (df = 3)       
Score (Logrank) Test       1.745 (df = 3)       
================================================
Note:                *p<0.1; **p<0.05; ***p<0.01

extractAIC(model08c)

[1]    3.0 9040.5

model08c <- coxph(Surv(agecensor,died0121)~sex + locsize01, data = MHAS0121[MHAS0121$schooling == 0 & MHAS0121$sex == 1,] )

model08c

Call:
coxph(formula = Surv(agecensor, died0121) ~ sex + locsize01, 
    data = MHAS0121[MHAS0121$schooling == 0 & MHAS0121$sex == 
        1, ])

               coef exp(coef) se(coef)      z     p
sex              NA        NA  0.00000     NA    NA
locsize012  0.01113   1.01119  0.13329  0.084 0.933
locsize013 -0.06899   0.93334  0.13396 -0.515 0.607
locsize014 -0.12170   0.88542  0.10429 -1.167 0.243

Likelihood ratio test=1.7  on 3 df, p=0.6359
n= 811, number of events= 528 
   (173 observations deleted due to missingness)

surv_fit <- survfit(model08c)


ggsurvplot(surv_fit, 
           data = MHAS0121, 
           conf.int = TRUE,                # Plot confidence intervals
           risk.table = TRUE,              # Add a risk table
           ggtheme = theme_minimal(),      # Use a minimal theme
           title = "Survival Curve: Men with 0 Years of Education and locsize01 = 'Level1'",
           xlab = "Time",
           xlim = c(45,100),
           size = 0.25,
           censor.size =2,
           ylab = "Survival Probability")

stargazer::stargazer(model08c, type="text",  out="models.txt",apply.coef = exp)


================================================
                         Dependent variable:    
                     ---------------------------
                              agecensor         
------------------------------------------------
sex                                             
                               (0.000)          
                                                
locsize012                    1.011***          
                               (0.133)          
                                                
locsize013                    0.933***          
                               (0.134)          
                                                
locsize014                    0.885***          
                               (0.104)          
                                                
------------------------------------------------
Observations                     811            
R2                              0.002           
Max. Possible R2                0.999           
Log Likelihood               -3,003.628         
Wald Test                  1.700 (df = 3)       
LR Test                    1.705 (df = 3)       
Score (Logrank) Test       1.697 (df = 3)       
================================================
Note:                *p<0.1; **p<0.05; ***p<0.01

extractAIC(model08c)

[1]    3.000 6013.257

model08c <- coxph(Surv(agecensor,died0121)~sex + educlevel + locsize01, data = MHAS0121[MHAS0121$schooling == 6 & MHAS0121$sex == 0,] )

model08c

Call:
coxph(formula = Surv(agecensor, died0121) ~ sex + educlevel + 
    locsize01, data = MHAS0121[MHAS0121$schooling == 6 & MHAS0121$sex == 
    0, ])

                 coef exp(coef) se(coef)     z      p
sex                NA        NA 0.000000    NA     NA
educlevel15        NA        NA 0.000000    NA     NA
educlevel68        NA        NA 0.000000    NA     NA
educlevel919       NA        NA 0.000000    NA     NA
locsize012   0.266921  1.305937 0.157005 1.700 0.0891
locsize013   0.338700  1.403122 0.230728 1.468 0.1421
locsize014   0.006052  1.006070 0.260260 0.023 0.9814

Likelihood ratio test=4.3  on 3 df, p=0.2311
n= 683, number of events= 298 
   (196 observations deleted due to missingness)

surv_fit <- survfit(model08c)


ggsurvplot(surv_fit, 
           data = MHAS0121, 
           conf.int = TRUE,                # Plot confidence intervals
           risk.table = TRUE,              # Add a risk table
           ggtheme = theme_minimal(),      # Use a minimal theme
           title = "Survival Curve: Women with 6 Years of Education and locsize01 = 'Level1'",
           xlab = "Time", 
           xlim = c(45,100),
           ylab = "Survival Probability")

model08c <- coxph(Surv(agecensor,died0121)~sex + educlevel + locsize01, data = MHAS0121[MHAS0121$schooling == 6 & MHAS0121$sex == 1,] )

model08c

Call:
coxph(formula = Surv(agecensor, died0121) ~ sex + educlevel + 
    locsize01, data = MHAS0121[MHAS0121$schooling == 6 & MHAS0121$sex == 
    1, ])

                 coef exp(coef) se(coef)      z      p
sex                NA        NA  0.00000     NA     NA
educlevel15        NA        NA  0.00000     NA     NA
educlevel68        NA        NA  0.00000     NA     NA
educlevel919       NA        NA  0.00000     NA     NA
locsize012   -0.23952   0.78700  0.15223 -1.573 0.1156
locsize013    0.06358   1.06565  0.22378  0.284 0.7763
locsize014   -0.38447   0.68081  0.18541 -2.074 0.0381

Likelihood ratio test=6.65  on 3 df, p=0.0838
n= 655, number of events= 334 
   (175 observations deleted due to missingness)

surv_fit <- survfit(model08c)


ggsurvplot(surv_fit, 
           data = MHAS0121, 
           conf.int = TRUE,                # Plot confidence intervals
           risk.table = TRUE,
           tables.theme = theme_cleantable(),
           axis.offset = TRUE,# Add a risk table
           ggtheme = theme_minimal(),      # Use a minimal theme
           title = "Survival Curve: Men with 6 Years of Education and locsize01 = 'Level1'",
           xlim = c(45,100),
           xlab = "Time", 
           ylab = "Survival Probability")

MHAS0121  %>%
  filter(sex == 0 & schooling == 0) %>%
survfit(Surv(agecensor,died0121) ~ sex + educ9 + locsize01, data = .) %>% 
  ggsurvfit(linetype_aes = FALSE, linewidth =0.5) +
  labs(
    x = "Age",
    y = "Cumulative survival probability"
    ) + 
  add_confidence_interval(type = "ribbon", alpha = 0.05) +
  theme_survminer()+
add_risktable(
  risktable_stats =
    c("{n.risk} ({cum.event})",
      "{round(estimate*100)}% ({round(conf.low*100)}, {round(conf.high*100)})"),
  stats_label = c("At Risk (Cum. Events)", "Survival (95% CI)")
)+
  ggtitle("")+
    scale_x_continuous(limits = c(45, 100))

Warning: Removed 25 rows containing missing values (`geom_step()`).

MHAS0121  %>%
  filter(sex == 1 & schooling == 0) %>%
survfit(Surv(agecensor,died0121) ~ sex + educ9 + locsize01, data = .) %>% 
  ggsurvfit(linetype_aes = FALSE, linewidth =0.5) +
  labs(
    x = "Age",
    y = "Cumulative survival probability"
    ) + 
  add_confidence_interval(type = "ribbon", alpha = 0.05) +
  theme_survminer()+
add_risktable(
  risktable_stats =
    c("{n.risk} ({cum.event})",
      "{round(estimate*100)}% ({round(conf.low*100)}, {round(conf.high*100)})"),
  stats_label = c("At Risk (Cum. Events)", "Survival (95% CI)")
)+
  ggtitle("")+
    scale_x_continuous(limits = c(45, 100))

Warning: Removed 17 rows containing missing values (`geom_step()`).

MHAS0121 <- as.data.frame(MHAS0121)

#::::::::::::::::::::::::::::::::::::::::::::::::::::::::
fit <- survfit( Surv(agecensor, died0121) ~ sex +locsize01+educlevel, data = MHAS0121 )
ggsurvplot_facet(fit, MHAS0121[MHAS0121$sex==0,], facet.by = "educ9",
                palette = "jco", pval = TRUE, size = 0.25,xlim = c(45,100), censor.size = 2)

Warning: `as.tibble()` was deprecated in tibble 2.0.0.
ℹ Please use `as_tibble()` instead.
ℹ The signature and semantics have changed, see `?as_tibble`.
ℹ The deprecated feature was likely used in the survminer package.
  Please report the issue at <https://github.com/kassambara/survminer/issues>.

Warning: `select_()` was deprecated in dplyr 0.7.0.
ℹ Please use `select()` instead.
ℹ The deprecated feature was likely used in the survminer package.
  Please report the issue at <https://github.com/kassambara/survminer/issues>.

Warning: This manual palette can handle a maximum of 10 values. You have
supplied 16.

Warning: Removed 510 rows containing missing values (`geom_step()`).

Warning: Removed 267 rows containing missing values (`geom_point()`).

#::::::::::::::::::::::::::::::::::::::::::::::::::::::::
fit <- survfit( Surv(agecensor, died0121) ~ sex +locsize01+educlevel, data = MHAS0121 )
ggsurvplot_facet(fit, MHAS0121[MHAS0121$sex==1,], facet.by = "educ9",
                palette = "jco", pval = TRUE, size = 0.25,xlim = c(45,100), censor.size = 2)

Warning: This manual palette can handle a maximum of 10 values. You have
supplied 16.

Warning: Removed 569 rows containing missing values (`geom_step()`).

Warning: Removed 292 rows containing missing values (`geom_point()`).

#::::::::::::::::::::::::::::::::::::::::::::::::::::::::
fit <- survfit( Surv(agecensor, died0121) ~ sex +locsize01+educlevel, data = MHAS0121 )
ggsurvplot_facet(fit, MHAS0121[MHAS0121$sex==0,], facet.by = "locsize01",
                palette = "jco", pval = TRUE, size = 0.25,xlim = c(45,100), censor.size = 2)

#::::::::::::::::::::::::::::::::::::::::::::::::::::::::
fit <- survfit( Surv(agecensor, died0121) ~ sex +locsize01+educlevel, data = MHAS0121 )
ggsurvplot_facet(fit, MHAS0121[MHAS0121$sex==1,], facet.by = "locsize01",
                palette = "jco", pval = TRUE, size = 0.25,xlim = c(45,100), censor.size = 2)