Question 4

Question 4.2

According to the question, we generate the survival times \(\tilde{t}_{ij}\) as follows:

set.seed(1220)
n <- 2000
ni <- sample(2:4, n, replace = TRUE)
N <- sum(ni)
X1 <- runif(N)
X2 <- rgamma(n, 0.5, 0.5) %>% log() %>% rep(time = ni)

u <- runif(N)
t <- sqrt(-log(1-u) / exp(-2 * X1 + X2))
c <- runif(N, 3, 6)
delta <- ifelse(t <= c, 1, 0)
t[t > c] <- c[t > c]

The summary statistics of \(t_{ij}\) and the numbers of 0 and 1 for \(\delta_{ij}\) are as follows:

summary(t)
##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
## 0.01915 0.93861 1.90999 2.33499 3.58516 5.99542
table(delta)
## delta
##    0    1 
## 1540 4503

Question 4.3

We fit the Cox PH model with both \(X1\) and \(X2\) as covariates as follows:

df <- data.frame(ID = rep(1:2000, ni), time = t, status = delta, X1 = X1, X2 = X2)
model1 <- coxph(Surv(time, status) ~ X1 + X2, data = df, ties = "breslow")
summary(model1)
## Call:
## coxph(formula = Surv(time, status) ~ X1 + X2, data = df, ties = "breslow")
## 
##   n= 6043, number of events= 4503 
## 
##        coef exp(coef) se(coef)      z Pr(>|z|)    
## X1 -2.06392   0.12696  0.05653 -36.51   <2e-16 ***
## X2  1.00164   2.72274  0.01531  65.42   <2e-16 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
##    exp(coef) exp(-coef) lower .95 upper .95
## X1     0.127     7.8768    0.1136    0.1418
## X2     2.723     0.3673    2.6423    2.8057
## 
## Concordance= 0.829  (se = 0.003 )
## Likelihood ratio test= 7021  on 2 df,   p=<2e-16
## Wald test            = 4566  on 2 df,   p=<2e-16
## Score (logrank) test = 4041  on 2 df,   p=<2e-16

We find that the estimates of \(\beta_1\) and \(\beta_2\) are given by \(\hat{\beta}_1=-2.06392\) and \(\hat{\beta}_2=1.00164\), which close to the true value \(\beta_1=-2\) and \(\beta_2=1\).

And we fit the Cox PH model with only \(X1\) as covariate as follows:

model2 <- coxph(Surv(time, status) ~ X1, data = df, ties = "breslow")
summary(model2)
## Call:
## coxph(formula = Surv(time, status) ~ X1, data = df, ties = "breslow")
## 
##   n= 6043, number of events= 4503 
## 
##        coef exp(coef) se(coef)      z Pr(>|z|)    
## X1 -0.90709   0.40370  0.05223 -17.37   <2e-16 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
##    exp(coef) exp(-coef) lower .95 upper .95
## X1    0.4037      2.477    0.3644    0.4472
## 
## Concordance= 0.593  (se = 0.004 )
## Likelihood ratio test= 303.6  on 1 df,   p=<2e-16
## Wald test            = 301.6  on 1 df,   p=<2e-16
## Score (logrank) test = 305.5  on 1 df,   p=<2e-16

We find that the estimate of \(\beta_1\) is given by \(\hat{\beta}_1=-0.90709\), which is different from true value. Therefore, \(X2\) is an important variable in this Cox PH model. If important variables are not observed and included in the model, the estimated coefficients of the observed variables are biased towards 0.

Question 4.4

We fit the Cox model with shared frailty as follows:

model3 <- emfrail(Surv(time, status) ~ X1 + cluster(ID), data = df)
summary(model3)
## Call: 
## emfrail(formula = Surv(time, status) ~ X1 + cluster(ID), data = df)
## 
## Regression coefficients:
##        coef exp(coef) se(coef)  adj. se        z p
## X1  -2.0817    0.1247   0.0715   0.0735 -28.3102 0
## Estimated distribution: gamma / left truncation: FALSE 
## 
## Fit summary:
## Commenges-Andersen test for heterogeneity: p-val  0 
## no-frailty Log-likelihood: -36389.92 
## Log-likelihood: -34950.11 
## LRT: 1/2 * pchisq(2880), p-val 0
## 
## Frailty summary:
##                    estimate lower 95% upper 95%
## Var[Z]                2.010     1.859     2.181
## Kendall's tau         0.501     0.482     0.522
## Median concordance    0.513     0.492     0.536
## E[logZ]              -1.278    -1.404    -1.167
## Var[logZ]             4.977     4.359     5.727
## theta                 0.498     0.459     0.538
## Confidence intervals based on the likelihood function

According to the result above, we find that the estimate of \(\beta_1\) is given by \(\hat{\beta}_1=-2.0817\), which is close to the true value. This means that the bias due to omitting important variables in the model is corrected by introducing a shared frailty into the model. And the \(p\)-value of likelihood ratio test show that the frailty term is highly significant. At last, we have \(\hat{\theta}=0.498\), which is close to the true value \(\theta=0.5\).