Question 4

We import the required data set.

kidneyTrans <- read.csv("kidneyTrans.csv")
kidneyTrans <- kidneyTrans %>% filter(age >= 35)
kidneyTrans %>% head() %>% kable()

obs	time	death	gender	race	age
1	1	0	1	1	46
2	5	0	1	1	51
3	7	1	1	1	55
4	9	0	1	1	57
5	13	0	1	1	45
6	13	0	1	1	43

We find that there are 863 observations. The meanings of each variable are as follows:

time means the survival times after transplant measured in days;

death means the state of patient, where 0 represents censoring and 1 represents observation;

gender means the gender of patient, where 1 represents male and 2 represents female;

race means the race of patient, where 1 represents white and 2 represents black.

Question 4.1

For kdiney transplant, it is more appropriate to assume that the hazard function obeys the log-normal distribution. Because the patients have rejection reaction to the newly transplanted organ so the hazard of death increases initially. And after the patients adapt to the new organ, the hazard of death decreases. The hazard function of Weibull distribution is monotonic, and the hazard function of log-normal is unimodal. Therefore, we choose the log-normal distribution for the survival times from kidney transplant to death.

Question 4.2

We filter the patients with \(35\leq\text{age}<55\).

data1 <- kidneyTrans %>% filter(age < 55)

Denote \(\hat{S}_1(t)\) be the Kaplan–Meier estimate of the survival function for patients with \(35\leq\text{age}<55\).

est_km1 <- survfit(Surv(time, death) ~ 1, conf.type = "plain", conf.int = 0.9, data = data1)
summary(est_km1)

## Call: survfit(formula = Surv(time, death) ~ 1, data = data1, conf.type = "plain", 
##     conf.int = 0.9)
## 
##  time n.risk n.event survival std.err lower 90% CI upper 90% CI
##     7    422       1    0.998 0.00237        0.994        1.000
##    10    421       1    0.995 0.00334        0.990        1.000
##    17    417       1    0.993 0.00410        0.986        1.000
##    26    416       1    0.990 0.00473        0.983        0.998
##    28    415       1    0.988 0.00529        0.979        0.997
##    37    412       1    0.986 0.00580        0.976        0.995
##    40    411       1    0.983 0.00626        0.973        0.994
##    44    410       1    0.981 0.00669        0.970        0.992
##    57    407       2    0.976 0.00747        0.964        0.988
##    59    405       1    0.974 0.00783        0.961        0.987
##    62    404       1    0.971 0.00818        0.958        0.985
##    68    400       2    0.966 0.00883        0.952        0.981
##    88    396       1    0.964 0.00914        0.949        0.979
##    91    395       1    0.962 0.00943        0.946        0.977
##    98    393       1    0.959 0.00972        0.943        0.975
##   104    392       1    0.957 0.01000        0.940        0.973
##   106    389       1    0.954 0.01027        0.937        0.971
##   119    381       1    0.952 0.01055        0.934        0.969
##   121    380       1    0.949 0.01081        0.931        0.967
##   150    377       1    0.947 0.01107        0.928        0.965
##   154    374       1    0.944 0.01133        0.925        0.963
##   190    371       1    0.942 0.01158        0.923        0.961
##   228    363       1    0.939 0.01183        0.920        0.958
##   242    357       1    0.936 0.01209        0.916        0.956
##   248    354       1    0.934 0.01234        0.913        0.954
##   249    353       1    0.931 0.01259        0.910        0.952
##   252    351       1    0.928 0.01283        0.907        0.950
##   273    345       1    0.926 0.01307        0.904        0.947
##   297    340       1    0.923 0.01331        0.901        0.945
##   311    338       1    0.920 0.01355        0.898        0.943
##   340    331       1    0.917 0.01379        0.895        0.940
##   344    327       1    0.915 0.01403        0.892        0.938
##   346    326       1    0.912 0.01427        0.888        0.935
##   354    323       1    0.909 0.01450        0.885        0.933
##   470    310       1    0.906 0.01475        0.882        0.930
##   478    309       1    0.903 0.01499        0.879        0.928
##   495    307       1    0.900 0.01522        0.875        0.925
##   614    295       1    0.897 0.01547        0.872        0.923
##   615    294       1    0.894 0.01572        0.868        0.920
##   697    282       1    0.891 0.01598        0.865        0.917
##   776    271       1    0.888 0.01626        0.861        0.914
##   793    269       1    0.884 0.01653        0.857        0.912
##   840    268       1    0.881 0.01679        0.853        0.909
##   875    262       1    0.878 0.01706        0.850        0.906
##   929    254       1    0.874 0.01734        0.846        0.903
##   939    252       1    0.871 0.01762        0.842        0.900
##   943    251       1    0.867 0.01788        0.838        0.897
##   945    250       2    0.860 0.01840        0.830        0.891
##  1016    241       1    0.857 0.01867        0.826        0.888
##  1164    220       1    0.853 0.01899        0.822        0.884
##  1186    218       1    0.849 0.01930        0.817        0.881
##  1191    217       1    0.845 0.01960        0.813        0.877
##  1275    203       1    0.841 0.01994        0.808        0.874
##  1326    198       1    0.837 0.02029        0.803        0.870
##  1331    197       1    0.832 0.02062        0.799        0.866
##  1388    191       1    0.828 0.02097        0.794        0.863
##  1473    185       1    0.824 0.02133        0.789        0.859
##  1777    151       1    0.818 0.02188        0.782        0.854
##  1820    141       1    0.812 0.02248        0.775        0.849
##  1835    140       1    0.807 0.02305        0.769        0.844
##  1877    137       1    0.801 0.02362        0.762        0.840
##  1940    132       1    0.795 0.02421        0.755        0.834
##  2034    122       1    0.788 0.02487        0.747        0.829
##  2276     95       1    0.780 0.02596        0.737        0.822
##  2301     93       1    0.771 0.02700        0.727        0.816
##  2369     88       1    0.763 0.02808        0.716        0.809
##  2414     85       1    0.754 0.02915        0.706        0.802
##  2421     84       1    0.745 0.03015        0.695        0.794
##  2557     74       1    0.735 0.03138        0.683        0.786
##  2567     73       1    0.725 0.03252        0.671        0.778
##  2650     65       1    0.713 0.03388        0.658        0.769
##  2795     44       1    0.697 0.03678        0.637        0.758
##  3146     19       1    0.661 0.04990        0.578        0.743

plot(est_km1, main = "90% Confidence interval without transformation", xlab = "Time (Days)", ylab = "Estimated survival function")

Then we use the plot of \(\mbox{log}\big\{-\mbox{log}\,\hat{S}_1(t)\big\}\) against \(\mbox{log}\,t\) to check whether the Weibull distribution is suitable for the survival times as follows:

y_axis <- log(-log(est_km1$surv))
x_axis <- log(est_km1$time)

ggplot(data.frame(x = x_axis, y = y_axis), mapping = aes(x = x, y = y)) +
  geom_step() +
  geom_smooth(method = 'lm', formula = y ~ x) + 
  labs(title = "Suitability of Weibull Distribution", x = expression(log(t)), y = "log{-log S(t)}") + 
  theme(plot.title = element_text(hjust = 0.5, size = 13), axis.title = element_text(size = 14), 
        axis.text.x = element_text(size = 11), axis.text.y = element_text(size = 12))

We use the plot of \(\mbox{log}\Big\{\hat{S}_1(t)/\big[1-\hat{S}_1(t)\big]\Big\}\) against \(\mbox{log}\,t\) to check whether the log-logistic distribution is suitable for the survival times as follows:

y_axis <- log(est_km1$surv / (1 - est_km1$surv))
x_axis <- log(est_km1$time)

ggplot(data.frame(x = x_axis, y = y_axis), mapping = aes(x = x, y = y)) +
  geom_step() +
  geom_smooth(method = 'lm', formula = y ~ x) + 
  labs(title = "Suitability of Log-logistic Distribution", x = expression(log(t)), y = "log{S(t) / [1-S(t)]}") + 
  theme(plot.title = element_text(hjust = 0.5, size = 13), axis.title = element_text(size = 14), 
        axis.text.x = element_text(size = 11), axis.text.y = element_text(size = 12))

We use the plot of \(\Phi^{-1}\big\{1-\hat{S}_1(t)\big\}\) against \(\mbox{log}\,t\) to check whether the log-logistic distribution is suitable for the survival times as follows:

y_axis <- qnorm(1 - est_km1$surv)
x_axis <- log(est_km1$time)

ggplot(data.frame(x = x_axis, y = y_axis), mapping = aes(x = x, y = y)) +
  geom_step() +
  geom_smooth(method = 'lm', formula = y ~ x) + 
  labs(title = "Suitability of Log-normal Distribution", x = expression(log(t)), y = "log{S(t) / [1-S(t)]}") + 
  theme(plot.title = element_text(hjust = 0.5, size = 13), axis.title = element_text(size = 14), 
        axis.text.x = element_text(size = 11), axis.text.y = element_text(size = 12))

According to the three plots above, we find that the plot of \(\Phi^{-1}\big\{1-\hat{S}_1(t)\big\}\) against \(\mbox{log}\,t\) is better approximated as a straight line, which indicates that the log-normal distribution is most suitable for the survival times. This conclusion is consistent with the guess in question 4.1.

Question 4.3

We filter the patients with \(\text{age}\geq55\).

data2 <- kidneyTrans %>% filter(age >= 55)

Denote \(\hat{S}_2(t)\) be the Kaplan–Meier estimate of the survival function for patients with \(\text{age}\geq55\).

est_km2 <- survfit(Surv(time, death) ~ 1, conf.type = "plain", conf.int = 0.9, data = data2)
summary(est_km2)

## Call: survfit(formula = Surv(time, death) ~ 1, data = data2, conf.type = "plain", 
##     conf.int = 0.9)
## 
##  time n.risk n.event survival std.err lower 90% CI upper 90% CI
##     2    199       1    0.995 0.00501        0.987        1.000
##     7    198       1    0.990 0.00707        0.978        1.000
##    10    195       1    0.985 0.00867        0.971        0.999
##    21    192       1    0.980 0.01003        0.963        0.996
##    26    191       1    0.975 0.01121        0.956        0.993
##    43    190       2    0.964 0.01323        0.943        0.986
##    45    186       1    0.959 0.01414        0.936        0.982
##    50    185       1    0.954 0.01498        0.929        0.979
##    52    183       1    0.949 0.01578        0.923        0.975
##    56    182       1    0.944 0.01653        0.916        0.971
##    62    181       1    0.938 0.01725        0.910        0.967
##    69    179       1    0.933 0.01793        0.904        0.963
##    78    178       1    0.928 0.01858        0.897        0.958
##    79    177       1    0.923 0.01920        0.891        0.954
##    97    172       1    0.917 0.01982        0.885        0.950
##    98    171       1    0.912 0.02042        0.878        0.945
##   106    170       1    0.907 0.02099        0.872        0.941
##   158    162       1    0.901 0.02160        0.865        0.936
##   162    161       1    0.895 0.02217        0.859        0.932
##   206    155       1    0.890 0.02277        0.852        0.927
##   291    144       1    0.883 0.02344        0.845        0.922
##   311    142       1    0.877 0.02408        0.838        0.917
##   334    140       1    0.871 0.02471        0.830        0.912
##   391    134       1    0.864 0.02537        0.823        0.906
##   402    132       1    0.858 0.02601        0.815        0.901
##   421    129       1    0.851 0.02664        0.807        0.895
##   439    127       1    0.844 0.02726        0.800        0.889
##   481    121       1    0.838 0.02792        0.792        0.883
##   490    119       1    0.830 0.02855        0.784        0.877
##   583    111       1    0.823 0.02926        0.775        0.871
##   652    104       1    0.815 0.03003        0.766        0.864
##   730     97       1    0.807 0.03087        0.756        0.857
##   790     92       1    0.798 0.03176        0.746        0.850
##   806     91       1    0.789 0.03260        0.736        0.843
##   864     88       1    0.780 0.03344        0.725        0.835
##   946     82       1    0.771 0.03436        0.714        0.827
##  1013     80       1    0.761 0.03525        0.703        0.819
##  1105     76       1    0.751 0.03618        0.691        0.811
##  1210     73       1    0.741 0.03712        0.680        0.802
##  1357     65       1    0.729 0.03826        0.666        0.792
##  1384     62       1    0.718 0.03941        0.653        0.782
##  1418     60       1    0.706 0.04053        0.639        0.772
##  1509     55       1    0.693 0.04177        0.624        0.761
##  1734     48       1    0.678 0.04332        0.607        0.750
##  2056     31       1    0.656 0.04713        0.579        0.734
##  2171     29       1    0.634 0.05065        0.551        0.717
##  2291     26       1    0.609 0.05425        0.520        0.699
##  2313     24       1    0.584 0.05763        0.489        0.679
##  2489     18       1    0.552 0.06290        0.448        0.655

plot(est_km2, main = "90% Confidence interval without transformation", xlab = "Time (Days)", ylab = "Estimated survival function")

According to the Kaplan–Meier estimate of the survival function \(\hat{S}_1(t)\) and \(\hat{S}_2(t)\). We choose \(p\) to range from \(0.01\) to \(0.33\) with increment \(0.01\). We obtain \(p\)-th quantile of the survival times for the two groups as follows:

km_quantile <- function(time, survival, p) {
  if (p > 1 | p < 0) {
    stop("p should range between 0 and 1.")
  }
  n <- length(time)
  index <- 0
  result <- numeric(length = 1)
  for (i in 1:n) {
    if (survival[i] == 1 - p) {
      result <- (time[i] + time[i + 1]) / 2
      index <- 1
      break
    }
  }
  if (index == 0) {
    for (i in 1:n) {
      if (survival[i] < 1 - p) {
        result <- time[i]
        index <- 1
        break
      }
    }
  }
  if (index == 0) {
    stop("The quantile can not be estimated.")
  }
  return(result)
}

p <- seq(0.01, 0.33, by = 0.01)
index1 <- est_km1$n.censor
time1 <- est_km1$time[index1 == 0]
surv1 <- est_km1$surv[index1 == 0]
t1 <- numeric(length = length(p))
for (i in 1:length(p)) {
  t1[i] <- km_quantile(time1, surv1, p[i])
}
index2 <- est_km2$n.censor
time2 <- est_km2$time[index2 == 0]
surv2 <- est_km2$surv[index2 == 0]
t2 <- numeric(length = length(p))
for (i in 1:length(p)) {
  t2[i] <- km_quantile(time2, surv2, p[i])
}
df <- data.frame(p = p, t1 = t1, t2 = t2)
df %>% kable()

p	t1	t2
0.01	28	7
0.02	57	21
0.03	68	45
0.04	98	45
0.05	154	52
0.06	228	62
0.07	252	78
0.08	344	97
0.09	354	106
0.10	614	206
0.11	776	206
0.12	875	311
0.13	943	391
0.14	1164	402
0.15	1186	439
0.16	1326	481
0.17	1388	583
0.18	1777	652
0.19	1835	730
0.20	1940	790
0.21	2276	806
0.22	2276	946
0.23	2414	1013
0.24	2414	1105
0.25	2421	1210
0.26	2557	1357
0.27	2567	1357
0.28	2650	1384
0.29	2795	1418
0.30	2795	1509
0.31	3146	1734
0.32	3146	1734
0.33	3146	2056

ggplot(data.frame(x = t1, y = t2), aes(x = x, y = y)) + 
  geom_point() + geom_smooth(span = 1) + 
  labs(title = "Q-Q Plot", x = bquote('quantile for' ~ 35 <= ~ 'age' < 55 ~ 'group' ), y = bquote('quantile for age' >= 55 ~ 'group')) + 
  theme(plot.title = element_text(hjust = 0.5, size = 13), axis.title = element_text(size = 14), 
        axis.text.x = element_text(size = 11), axis.text.y = element_text(size = 12))

We find that the Q-Q plot is not a straight line, which indicates that the AFT model may not valid.

Question 4.4

We use the plot of \(\mbox{log}\big\{-\mbox{log}\,S(t)\big\}\) against \(t\) to check the proportional hazards assumptions as follows:

y1 <- log(-log(est_km1$surv))
y2 <- log(-log(est_km2$surv))
t1 <- est_km1$time
t2 <- est_km2$time
ggplot() + 
  geom_step(aes(x = t1, y = y1, color = "35<=age<55"), size = 1) + 
  geom_step(aes(x = t2, y = y2, color = "age>=55"), size = 1) + 
  labs(title = "Kaplan-Meier estimate in the log{log[S(t)]}", x = "Time", y = "log{-log[S(t)]}") + 
  theme(plot.title = element_text(hjust = 0.5, size = 13), axis.title = element_text(size = 14), 
        axis.text.x = element_text(size = 11), axis.text.y = element_text(size = 12))

## Warning: Using `size` aesthetic for lines was deprecated in ggplot2 3.4.0.
## ℹ Please use `linewidth` instead.
## This warning is displayed once every 8 hours.
## Call `lifecycle::last_lifecycle_warnings()` to see where this warning was
## generated.

According to the plot above, we find that the two curves demonstrate constant vertical shift, which indicates the proportional hazards assumption is valid. Therefore, the Cox model is appropriate to compare the survival times of the two groups of patients.

Assignment 4 of Survival Analysis

Zhao Yuanfan

2023-12-01

Question 4

Question 4.1

Question 4.2

Question 4.3

Question 4.4