Question 4

We import the required data set.

kidneyTrans <- read.csv("kidneyTrans.csv")
kidneyTrans <- kidneyTrans %>% filter(age >= 35)
kidneyTrans %>% head() %>% kable()
obs time death gender race age
1 1 0 1 1 46
2 5 0 1 1 51
3 7 1 1 1 55
4 9 0 1 1 57
5 13 0 1 1 45
6 13 0 1 1 43

We find that there are 863 observations. The meanings of each variable are as follows:

time means the survival times after transplant measured in days;

death means the state of patient, where 0 represents censoring and 1 represents observation;

gender means the gender of patient, where 1 represents male and 2 represents female;

race means the race of patient, where 1 represents white and 2 represents black.

Question 4.1

For kdiney transplant, it is more appropriate to assume that the hazard function obeys the log-normal distribution. Because the patients have rejection reaction to the newly transplanted organ so the hazard of death increases initially. And after the patients adapt to the new organ, the hazard of death decreases. The hazard function of Weibull distribution is monotonic, and the hazard function of log-normal is unimodal. Therefore, we choose the log-normal distribution for the survival times from kidney transplant to death.

Question 4.2

We filter the patients with \(35\leq\text{age}<55\).

data1 <- kidneyTrans %>% filter(age < 55)

Denote \(\hat{S}_1(t)\) be the Kaplan–Meier estimate of the survival function for patients with \(35\leq\text{age}<55\).

est_km1 <- survfit(Surv(time, death) ~ 1, conf.type = "plain", conf.int = 0.9, data = data1)
summary(est_km1)
## Call: survfit(formula = Surv(time, death) ~ 1, data = data1, conf.type = "plain", 
##     conf.int = 0.9)
## 
##  time n.risk n.event survival std.err lower 90% CI upper 90% CI
##     7    422       1    0.998 0.00237        0.994        1.000
##    10    421       1    0.995 0.00334        0.990        1.000
##    17    417       1    0.993 0.00410        0.986        1.000
##    26    416       1    0.990 0.00473        0.983        0.998
##    28    415       1    0.988 0.00529        0.979        0.997
##    37    412       1    0.986 0.00580        0.976        0.995
##    40    411       1    0.983 0.00626        0.973        0.994
##    44    410       1    0.981 0.00669        0.970        0.992
##    57    407       2    0.976 0.00747        0.964        0.988
##    59    405       1    0.974 0.00783        0.961        0.987
##    62    404       1    0.971 0.00818        0.958        0.985
##    68    400       2    0.966 0.00883        0.952        0.981
##    88    396       1    0.964 0.00914        0.949        0.979
##    91    395       1    0.962 0.00943        0.946        0.977
##    98    393       1    0.959 0.00972        0.943        0.975
##   104    392       1    0.957 0.01000        0.940        0.973
##   106    389       1    0.954 0.01027        0.937        0.971
##   119    381       1    0.952 0.01055        0.934        0.969
##   121    380       1    0.949 0.01081        0.931        0.967
##   150    377       1    0.947 0.01107        0.928        0.965
##   154    374       1    0.944 0.01133        0.925        0.963
##   190    371       1    0.942 0.01158        0.923        0.961
##   228    363       1    0.939 0.01183        0.920        0.958
##   242    357       1    0.936 0.01209        0.916        0.956
##   248    354       1    0.934 0.01234        0.913        0.954
##   249    353       1    0.931 0.01259        0.910        0.952
##   252    351       1    0.928 0.01283        0.907        0.950
##   273    345       1    0.926 0.01307        0.904        0.947
##   297    340       1    0.923 0.01331        0.901        0.945
##   311    338       1    0.920 0.01355        0.898        0.943
##   340    331       1    0.917 0.01379        0.895        0.940
##   344    327       1    0.915 0.01403        0.892        0.938
##   346    326       1    0.912 0.01427        0.888        0.935
##   354    323       1    0.909 0.01450        0.885        0.933
##   470    310       1    0.906 0.01475        0.882        0.930
##   478    309       1    0.903 0.01499        0.879        0.928
##   495    307       1    0.900 0.01522        0.875        0.925
##   614    295       1    0.897 0.01547        0.872        0.923
##   615    294       1    0.894 0.01572        0.868        0.920
##   697    282       1    0.891 0.01598        0.865        0.917
##   776    271       1    0.888 0.01626        0.861        0.914
##   793    269       1    0.884 0.01653        0.857        0.912
##   840    268       1    0.881 0.01679        0.853        0.909
##   875    262       1    0.878 0.01706        0.850        0.906
##   929    254       1    0.874 0.01734        0.846        0.903
##   939    252       1    0.871 0.01762        0.842        0.900
##   943    251       1    0.867 0.01788        0.838        0.897
##   945    250       2    0.860 0.01840        0.830        0.891
##  1016    241       1    0.857 0.01867        0.826        0.888
##  1164    220       1    0.853 0.01899        0.822        0.884
##  1186    218       1    0.849 0.01930        0.817        0.881
##  1191    217       1    0.845 0.01960        0.813        0.877
##  1275    203       1    0.841 0.01994        0.808        0.874
##  1326    198       1    0.837 0.02029        0.803        0.870
##  1331    197       1    0.832 0.02062        0.799        0.866
##  1388    191       1    0.828 0.02097        0.794        0.863
##  1473    185       1    0.824 0.02133        0.789        0.859
##  1777    151       1    0.818 0.02188        0.782        0.854
##  1820    141       1    0.812 0.02248        0.775        0.849
##  1835    140       1    0.807 0.02305        0.769        0.844
##  1877    137       1    0.801 0.02362        0.762        0.840
##  1940    132       1    0.795 0.02421        0.755        0.834
##  2034    122       1    0.788 0.02487        0.747        0.829
##  2276     95       1    0.780 0.02596        0.737        0.822
##  2301     93       1    0.771 0.02700        0.727        0.816
##  2369     88       1    0.763 0.02808        0.716        0.809
##  2414     85       1    0.754 0.02915        0.706        0.802
##  2421     84       1    0.745 0.03015        0.695        0.794
##  2557     74       1    0.735 0.03138        0.683        0.786
##  2567     73       1    0.725 0.03252        0.671        0.778
##  2650     65       1    0.713 0.03388        0.658        0.769
##  2795     44       1    0.697 0.03678        0.637        0.758
##  3146     19       1    0.661 0.04990        0.578        0.743
plot(est_km1, main = "90% Confidence interval without transformation", xlab = "Time (Days)", ylab = "Estimated survival function")

Then we use the plot of \(\mbox{log}\big\{-\mbox{log}\,\hat{S}_1(t)\big\}\) against \(\mbox{log}\,t\) to check whether the Weibull distribution is suitable for the survival times as follows:

y_axis <- log(-log(est_km1$surv))
x_axis <- log(est_km1$time)

ggplot(data.frame(x = x_axis, y = y_axis), mapping = aes(x = x, y = y)) +
  geom_step() +
  geom_smooth(method = 'lm', formula = y ~ x) + 
  labs(title = "Suitability of Weibull Distribution", x = expression(log(t)), y = "log{-log S(t)}") + 
  theme(plot.title = element_text(hjust = 0.5, size = 13), axis.title = element_text(size = 14), 
        axis.text.x = element_text(size = 11), axis.text.y = element_text(size = 12))

We use the plot of \(\mbox{log}\Big\{\hat{S}_1(t)/\big[1-\hat{S}_1(t)\big]\Big\}\) against \(\mbox{log}\,t\) to check whether the log-logistic distribution is suitable for the survival times as follows:

y_axis <- log(est_km1$surv / (1 - est_km1$surv))
x_axis <- log(est_km1$time)

ggplot(data.frame(x = x_axis, y = y_axis), mapping = aes(x = x, y = y)) +
  geom_step() +
  geom_smooth(method = 'lm', formula = y ~ x) + 
  labs(title = "Suitability of Log-logistic Distribution", x = expression(log(t)), y = "log{S(t) / [1-S(t)]}") + 
  theme(plot.title = element_text(hjust = 0.5, size = 13), axis.title = element_text(size = 14), 
        axis.text.x = element_text(size = 11), axis.text.y = element_text(size = 12))

We use the plot of \(\Phi^{-1}\big\{1-\hat{S}_1(t)\big\}\) against \(\mbox{log}\,t\) to check whether the log-logistic distribution is suitable for the survival times as follows:

y_axis <- qnorm(1 - est_km1$surv)
x_axis <- log(est_km1$time)

ggplot(data.frame(x = x_axis, y = y_axis), mapping = aes(x = x, y = y)) +
  geom_step() +
  geom_smooth(method = 'lm', formula = y ~ x) + 
  labs(title = "Suitability of Log-normal Distribution", x = expression(log(t)), y = "log{S(t) / [1-S(t)]}") + 
  theme(plot.title = element_text(hjust = 0.5, size = 13), axis.title = element_text(size = 14), 
        axis.text.x = element_text(size = 11), axis.text.y = element_text(size = 12))

According to the three plots above, we find that the plot of \(\Phi^{-1}\big\{1-\hat{S}_1(t)\big\}\) against \(\mbox{log}\,t\) is better approximated as a straight line, which indicates that the log-normal distribution is most suitable for the survival times. This conclusion is consistent with the guess in question 4.1.

Question 4.3

We filter the patients with \(\text{age}\geq55\).

data2 <- kidneyTrans %>% filter(age >= 55)

Denote \(\hat{S}_2(t)\) be the Kaplan–Meier estimate of the survival function for patients with \(\text{age}\geq55\).

est_km2 <- survfit(Surv(time, death) ~ 1, conf.type = "plain", conf.int = 0.9, data = data2)
summary(est_km2)
## Call: survfit(formula = Surv(time, death) ~ 1, data = data2, conf.type = "plain", 
##     conf.int = 0.9)
## 
##  time n.risk n.event survival std.err lower 90% CI upper 90% CI
##     2    199       1    0.995 0.00501        0.987        1.000
##     7    198       1    0.990 0.00707        0.978        1.000
##    10    195       1    0.985 0.00867        0.971        0.999
##    21    192       1    0.980 0.01003        0.963        0.996
##    26    191       1    0.975 0.01121        0.956        0.993
##    43    190       2    0.964 0.01323        0.943        0.986
##    45    186       1    0.959 0.01414        0.936        0.982
##    50    185       1    0.954 0.01498        0.929        0.979
##    52    183       1    0.949 0.01578        0.923        0.975
##    56    182       1    0.944 0.01653        0.916        0.971
##    62    181       1    0.938 0.01725        0.910        0.967
##    69    179       1    0.933 0.01793        0.904        0.963
##    78    178       1    0.928 0.01858        0.897        0.958
##    79    177       1    0.923 0.01920        0.891        0.954
##    97    172       1    0.917 0.01982        0.885        0.950
##    98    171       1    0.912 0.02042        0.878        0.945
##   106    170       1    0.907 0.02099        0.872        0.941
##   158    162       1    0.901 0.02160        0.865        0.936
##   162    161       1    0.895 0.02217        0.859        0.932
##   206    155       1    0.890 0.02277        0.852        0.927
##   291    144       1    0.883 0.02344        0.845        0.922
##   311    142       1    0.877 0.02408        0.838        0.917
##   334    140       1    0.871 0.02471        0.830        0.912
##   391    134       1    0.864 0.02537        0.823        0.906
##   402    132       1    0.858 0.02601        0.815        0.901
##   421    129       1    0.851 0.02664        0.807        0.895
##   439    127       1    0.844 0.02726        0.800        0.889
##   481    121       1    0.838 0.02792        0.792        0.883
##   490    119       1    0.830 0.02855        0.784        0.877
##   583    111       1    0.823 0.02926        0.775        0.871
##   652    104       1    0.815 0.03003        0.766        0.864
##   730     97       1    0.807 0.03087        0.756        0.857
##   790     92       1    0.798 0.03176        0.746        0.850
##   806     91       1    0.789 0.03260        0.736        0.843
##   864     88       1    0.780 0.03344        0.725        0.835
##   946     82       1    0.771 0.03436        0.714        0.827
##  1013     80       1    0.761 0.03525        0.703        0.819
##  1105     76       1    0.751 0.03618        0.691        0.811
##  1210     73       1    0.741 0.03712        0.680        0.802
##  1357     65       1    0.729 0.03826        0.666        0.792
##  1384     62       1    0.718 0.03941        0.653        0.782
##  1418     60       1    0.706 0.04053        0.639        0.772
##  1509     55       1    0.693 0.04177        0.624        0.761
##  1734     48       1    0.678 0.04332        0.607        0.750
##  2056     31       1    0.656 0.04713        0.579        0.734
##  2171     29       1    0.634 0.05065        0.551        0.717
##  2291     26       1    0.609 0.05425        0.520        0.699
##  2313     24       1    0.584 0.05763        0.489        0.679
##  2489     18       1    0.552 0.06290        0.448        0.655
plot(est_km2, main = "90% Confidence interval without transformation", xlab = "Time (Days)", ylab = "Estimated survival function")

According to the Kaplan–Meier estimate of the survival function \(\hat{S}_1(t)\) and \(\hat{S}_2(t)\). We choose \(p\) to range from \(0.01\) to \(0.33\) with increment \(0.01\). We obtain \(p\)-th quantile of the survival times for the two groups as follows:

km_quantile <- function(time, survival, p) {
  if (p > 1 | p < 0) {
    stop("p should range between 0 and 1.")
  }
  n <- length(time)
  index <- 0
  result <- numeric(length = 1)
  for (i in 1:n) {
    if (survival[i] == 1 - p) {
      result <- (time[i] + time[i + 1]) / 2
      index <- 1
      break
    }
  }
  if (index == 0) {
    for (i in 1:n) {
      if (survival[i] < 1 - p) {
        result <- time[i]
        index <- 1
        break
      }
    }
  }
  if (index == 0) {
    stop("The quantile can not be estimated.")
  }
  return(result)
}

p <- seq(0.01, 0.33, by = 0.01)
index1 <- est_km1$n.censor
time1 <- est_km1$time[index1 == 0]
surv1 <- est_km1$surv[index1 == 0]
t1 <- numeric(length = length(p))
for (i in 1:length(p)) {
  t1[i] <- km_quantile(time1, surv1, p[i])
}
index2 <- est_km2$n.censor
time2 <- est_km2$time[index2 == 0]
surv2 <- est_km2$surv[index2 == 0]
t2 <- numeric(length = length(p))
for (i in 1:length(p)) {
  t2[i] <- km_quantile(time2, surv2, p[i])
}
df <- data.frame(p = p, t1 = t1, t2 = t2)
df %>% kable()
p t1 t2
0.01 28 7
0.02 57 21
0.03 68 45
0.04 98 45
0.05 154 52
0.06 228 62
0.07 252 78
0.08 344 97
0.09 354 106
0.10 614 206
0.11 776 206
0.12 875 311
0.13 943 391
0.14 1164 402
0.15 1186 439
0.16 1326 481
0.17 1388 583
0.18 1777 652
0.19 1835 730
0.20 1940 790
0.21 2276 806
0.22 2276 946
0.23 2414 1013
0.24 2414 1105
0.25 2421 1210
0.26 2557 1357
0.27 2567 1357
0.28 2650 1384
0.29 2795 1418
0.30 2795 1509
0.31 3146 1734
0.32 3146 1734
0.33 3146 2056
ggplot(data.frame(x = t1, y = t2), aes(x = x, y = y)) + 
  geom_point() + geom_smooth(span = 1) + 
  labs(title = "Q-Q Plot", x = bquote('quantile for' ~ 35 <= ~ 'age' < 55 ~ 'group' ), y = bquote('quantile for age' >= 55 ~ 'group')) + 
  theme(plot.title = element_text(hjust = 0.5, size = 13), axis.title = element_text(size = 14), 
        axis.text.x = element_text(size = 11), axis.text.y = element_text(size = 12))

We find that the Q-Q plot is not a straight line, which indicates that the AFT model may not valid.

Question 4.4

We use the plot of \(\mbox{log}\big\{-\mbox{log}\,S(t)\big\}\) against \(t\) to check the proportional hazards assumptions as follows:

y1 <- log(-log(est_km1$surv))
y2 <- log(-log(est_km2$surv))
t1 <- est_km1$time
t2 <- est_km2$time
ggplot() + 
  geom_step(aes(x = t1, y = y1, color = "35<=age<55"), size = 1) + 
  geom_step(aes(x = t2, y = y2, color = "age>=55"), size = 1) + 
  labs(title = "Kaplan-Meier estimate in the log{log[S(t)]}", x = "Time", y = "log{-log[S(t)]}") + 
  theme(plot.title = element_text(hjust = 0.5, size = 13), axis.title = element_text(size = 14), 
        axis.text.x = element_text(size = 11), axis.text.y = element_text(size = 12))
## Warning: Using `size` aesthetic for lines was deprecated in ggplot2 3.4.0.
## ℹ Please use `linewidth` instead.
## This warning is displayed once every 8 hours.
## Call `lifecycle::last_lifecycle_warnings()` to see where this warning was
## generated.

According to the plot above, we find that the two curves demonstrate constant vertical shift, which indicates the proportional hazards assumption is valid. Therefore, the Cox model is appropriate to compare the survival times of the two groups of patients.