Rows: 9636 Columns: 39
── Column specification ────────────────────────────────────────────────────────
Delimiter: ","
dbl (39): id, locsize01, perwght01, died18, died21, mobirth, yrbirth, female...
ℹ Use `spec()` to retrieve the full column specification for this data.
ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.
fit <-with(MHAS0121, Surv(agecensor,died0121))
ID – Unique person identifier
FEMALE – dummy variable for whether the sampled individual identifies as a woman or a man (no other options; note that in some in-person questionnaires, this is based on external classification by interviewers as opposed to self-reported by interviewees).
SCHOOLING – Years of formal schooling.
EDUCLEVEL – Schooling variable coding in intervals, including: no formal schooling (0), 1 – 5 years (15), 6-8 years (68), and 9 or more years (because it is topped at 19 years, value is 919).
LOCSIZE01 – Size of the locality of residence at baseline in 2001, ranging from 100,000+ (1), 15,000 – 99,999 (2), 2,500 – 14,999 (3), and less than 2,500 (4). In Mexico, urban areas are defined as those having at least 2,500 inhabitants (one of the lowest bars for urban I have ever encountered).
MOBIRTH – Month of birth
YRBIRTH – Year of birth
MOINT – Month of baseline interview (in 2001)
YRINT – Year of baseline interview (i.e., 2001)
DIED0121 – Dummy variable for whether person passed away in the inter-wave period (otherwise,censored at re-interview, lost to follow-up, or refused interview, all coded as 0)
MOCENSOR – Month in which person was ‘observed’ for the last time (whether because s/he passed
away, was re-interviewed, rejected the survey, or when we assume was lost to follow-up). YRCENSOR – Year in which person was ‘observed’ for the last time (see MOCENSOR). We will also create other variables in Stata that we will need for the exercise (see accompanying Stata do file):
AGEINT – Age at which person was interviewed at baseline
AGECENSOR – Age the interviewee was when we observed them last.
(1) the variable that measures the time each person was exposed to the risk of death before dying/being censored for other reasons.
Agecensor is the variabel that measures the time each person was exposed to the risk of death before dying or censoring.
(2) the variable that indicates whether the event occurred or not (i.e., whether the person died in the interval); Died0121 is the variable that indicates whether or not th person has died.
(3)the time in which the person entered and exited the risk set. After doing this (or, without it, if your package does not require it):
I. Multivariate parametric proportional hazards models
A. Intercept-Only Exponential Model
Estimate parametric proportional hazards model with an exponential (i.e., constant) with no covariates. Instead of hazard ratios, obtain coefficients. Exponentiate the intercept of the model. (Later on, we will compare this to the crude death rate for people 50+ we will calculate in a later problem set).
Estimate another PH parametric model controlling for FEMALE, SCHOOLING (or
categories of EDUCLEVEL, whatever you prefer, justifying your choice), and LOCSIZE01. Assuming the baseline hazard is distributed:
C. Multiple Models
i. Exponential Model (Covariates)
model03 <-phreg(formula =Surv(agecensor, died0121) ~ sex + educlevel + locsize01, data = MHAS0121, dist ="weibull", shape =1) model03
Call:
phreg(formula = Surv(agecensor, died0121) ~ sex + educlevel +
locsize01, data = MHAS0121, dist = "weibull", shape = 1)
Covariate W.mean Coef Exp(Coef) se(Coef) Wald p
sex 0.450 0.131 1.140 0.032 0.000
educlevel
0 0.276 0 1 (reference)
15 0.356 -0.136 0.872 0.039 0.000
68 0.210 -0.287 0.751 0.047 0.000
919 0.158 -0.458 0.633 0.055 0.000
locsize01
1 0.572 0 1 (reference)
2 0.154 -0.024 0.976 0.046 0.599
3 0.099 -0.050 0.951 0.055 0.356
4 0.175 -0.140 0.869 0.046 0.002
log(scale) 4.857 0.036 0.000
Shape is fixed at 1
Events 3983
Total time at risk 591241
Max. log. likelihood -23852
LR test statistic 93.06
Degrees of freedom 7
Overall p-value 0
model08b <-coxph(formula =Surv(agecensor, died0121) ~ sex +strata(educ9) + locsize01, data = MHAS0121)stargazer::stargazer(model08b, type="text", out="models.txt",apply.coef = exp)
================================================
Dependent variable:
---------------------------
agecensor
------------------------------------------------
sex 1.316***
(0.032)
locsize012 1.005***
(0.046)
locsize013 0.890***
(0.054)
locsize014 0.840***
(0.045)
------------------------------------------------
Observations 7,552
R2 0.011
Max. Possible R2 1.000
Log Likelihood -29,871.940
Wald Test 84.400*** (df = 4)
LR Test 84.099*** (df = 4)
Score (Logrank) Test 84.759*** (df = 4)
================================================
Note: *p<0.1; **p<0.05; ***p<0.01
extractAIC(model08b)
[1] 4.00 59751.88
baseline_hazard <-basehaz(model08a)# Plot the cumulative baseline hazard over timeplot(baseline_hazard$time, baseline_hazard$hazard, type ="l", xlab ="Time", ylab ="Cumulative Baseline Hazard", main ="Cumulative Baseline Hazard under Cox Model", col ="red")
Call:
coxph(formula = Surv(agecensor, died0121) ~ sex + locsize01,
data = MHAS0121[MHAS0121$schooling == 0 & MHAS0121$sex ==
0, ])
coef exp(coef) se(coef) z p
sex NA NA 0.00000 NA NA
locsize012 -0.05284 0.94853 0.10575 -0.500 0.617
locsize013 -0.15156 0.85937 0.12066 -1.256 0.209
locsize014 -0.06388 0.93812 0.08997 -0.710 0.478
Likelihood ratio test=1.77 on 3 df, p=0.6212
n= 1213, number of events= 747
(281 observations deleted due to missingness)
surv_fit <-survfit(model08c)ggsurvplot(surv_fit, data = MHAS0121, conf.int =TRUE, # Plot confidence intervalsrisk.table =TRUE, # Add a risk tableggtheme =theme_minimal(), # Use a minimal themetitle ="Survival Curve: Women with 0 Years of Education and locsize01 = 'Level1'",xlab ="Time", size =0.25,censor.size =2,xlim=c(45,100),ylab ="Survival Probability")
Call:
coxph(formula = Surv(agecensor, died0121) ~ sex + locsize01,
data = MHAS0121[MHAS0121$schooling == 0 & MHAS0121$sex ==
1, ])
coef exp(coef) se(coef) z p
sex NA NA 0.00000 NA NA
locsize012 0.01113 1.01119 0.13329 0.084 0.933
locsize013 -0.06899 0.93334 0.13396 -0.515 0.607
locsize014 -0.12170 0.88542 0.10429 -1.167 0.243
Likelihood ratio test=1.7 on 3 df, p=0.6359
n= 811, number of events= 528
(173 observations deleted due to missingness)
surv_fit <-survfit(model08c)ggsurvplot(surv_fit, data = MHAS0121, conf.int =TRUE, # Plot confidence intervalsrisk.table =TRUE, # Add a risk tableggtheme =theme_minimal(), # Use a minimal themetitle ="Survival Curve: Men with 0 Years of Education and locsize01 = 'Level1'",xlab ="Time",xlim =c(45,100),size =0.25,censor.size =2,ylab ="Survival Probability")
Call:
coxph(formula = Surv(agecensor, died0121) ~ sex + educlevel +
locsize01, data = MHAS0121[MHAS0121$schooling == 6 & MHAS0121$sex ==
0, ])
coef exp(coef) se(coef) z p
sex NA NA 0.000000 NA NA
educlevel15 NA NA 0.000000 NA NA
educlevel68 NA NA 0.000000 NA NA
educlevel919 NA NA 0.000000 NA NA
locsize012 0.266921 1.305937 0.157005 1.700 0.0891
locsize013 0.338700 1.403122 0.230728 1.468 0.1421
locsize014 0.006052 1.006070 0.260260 0.023 0.9814
Likelihood ratio test=4.3 on 3 df, p=0.2311
n= 683, number of events= 298
(196 observations deleted due to missingness)
surv_fit <-survfit(model08c)ggsurvplot(surv_fit, data = MHAS0121, conf.int =TRUE, # Plot confidence intervalsrisk.table =TRUE, # Add a risk tableggtheme =theme_minimal(), # Use a minimal themetitle ="Survival Curve: Women with 6 Years of Education and locsize01 = 'Level1'",xlab ="Time", xlim =c(45,100),ylab ="Survival Probability")
Call:
coxph(formula = Surv(agecensor, died0121) ~ sex + educlevel +
locsize01, data = MHAS0121[MHAS0121$schooling == 6 & MHAS0121$sex ==
1, ])
coef exp(coef) se(coef) z p
sex NA NA 0.00000 NA NA
educlevel15 NA NA 0.00000 NA NA
educlevel68 NA NA 0.00000 NA NA
educlevel919 NA NA 0.00000 NA NA
locsize012 -0.23952 0.78700 0.15223 -1.573 0.1156
locsize013 0.06358 1.06565 0.22378 0.284 0.7763
locsize014 -0.38447 0.68081 0.18541 -2.074 0.0381
Likelihood ratio test=6.65 on 3 df, p=0.0838
n= 655, number of events= 334
(175 observations deleted due to missingness)
surv_fit <-survfit(model08c)ggsurvplot(surv_fit, data = MHAS0121, conf.int =TRUE, # Plot confidence intervalsrisk.table =TRUE,tables.theme =theme_cleantable(),axis.offset =TRUE,# Add a risk tableggtheme =theme_minimal(), # Use a minimal themetitle ="Survival Curve: Men with 6 Years of Education and locsize01 = 'Level1'",xlim =c(45,100),xlab ="Time", ylab ="Survival Probability")
Warning: `as.tibble()` was deprecated in tibble 2.0.0.
ℹ Please use `as_tibble()` instead.
ℹ The signature and semantics have changed, see `?as_tibble`.
ℹ The deprecated feature was likely used in the survminer package.
Please report the issue at <https://github.com/kassambara/survminer/issues>.
Warning: `select_()` was deprecated in dplyr 0.7.0.
ℹ Please use `select()` instead.
ℹ The deprecated feature was likely used in the survminer package.
Please report the issue at <https://github.com/kassambara/survminer/issues>.
Warning: This manual palette can handle a maximum of 10 values. You have
supplied 16.