1. Table of Contents


This document presents a non-exhaustive list of survival analysis and descriptive modelling methods for a two-group right-censored data with time-independent variables using various helpful packages in R.

1.1 Sample Data


The Leukemia dataset from the book Survival Analysis : A Self-Learning Text was used for this illustrated example.

Preliminary dataset assessment:

[A] 42 Rows (observations composed of leukemia patients enrolled in the study)

[B] 4 Columns (variables)
     [B.1] 1/4 Factor variable = Group (factor)
          [B.1.1] Placebo = placebo drug treatment
          [B.1.2] Treatment = new drug treatment
     [B.2] 1/4 Censoring variable = Status (factor)
          [B.2.1] 1 = event, patient went out of remission before the end of the study
          [B.2.2] 0 = censored or non-event, patient remains in remission until the end of the study, lost to follow-up, experiences a different event that makes further follow-up impossible or withdraws before the end of the study
     [B.3] 1/4 Response variables = Week (numeric)
          [B.3.1] Week = time in weeks a patient is in remission up to the point of the patient’s either going out of remission or being censored
     [B.4] 1/4 Covariate variable = WBC (numeric)
          [B.4.1] WBC = logarithm-transformed white blood cell measurement data gathered before the study which is also considered an important survival predictor in leukemia patients
## [1] 42  5
## 'data.frame':    42 obs. of  5 variables:
##  $ ID    : int  1 2 3 4 5 6 7 8 9 10 ...
##  $ Group : Factor w/ 2 levels "Placebo","Treatment": 2 2 2 2 2 2 2 2 2 2 ...
##  $ Week  : int  6 6 6 7 10 13 16 22 23 6 ...
##  $ Status: int  1 1 1 1 1 1 1 1 1 0 ...
##  $ WBC   : num  2.31 4.06 3.28 4.43 2.96 2.88 3.6 2.32 2.57 3.2 ...
##        ID              Group         Week           Status      
##  Min.   : 1.00   Placebo  :21   Min.   : 1.00   Min.   :0.0000  
##  1st Qu.:11.25   Treatment:21   1st Qu.: 6.00   1st Qu.:0.0000  
##  Median :21.50                  Median :10.50   Median :1.0000  
##  Mean   :21.50                  Mean   :12.88   Mean   :0.7143  
##  3rd Qu.:31.75                  3rd Qu.:18.50   3rd Qu.:1.0000  
##  Max.   :42.00                  Max.   :35.00   Max.   :1.0000  
##       WBC       
##  Min.   :1.450  
##  1st Qu.:2.303  
##  Median :2.800  
##  Mean   :2.930  
##  3rd Qu.:3.490  
##  Max.   :5.000
##   Column.Index Column.Name Column.Type
## 1            1        Week     integer
## 2            2      Status     integer
## 3            3         WBC     numeric
## 4            4 StatusLevel      factor

1.2 Data Quality Assessment


Data quality assessment:

[A] No missing observations noted for any response and covariate variable.

[B] No low variance (Unique.Count.Ratio<0.01 or First.Second.Mode.Ratio>5) noted for any response variable.

[C] No high skewness (Skewness>3 or Skewness<(-3)) noted for any response variable.
##   Column.Index Column.Name Column.Type Row.Count NA.Count Fill.Rate
## 1            1        Week     integer        42        0     1.000
## 2            2      Status     integer        42        0     1.000
## 3            3         WBC     numeric        42        0     1.000
## 4            4 StatusLevel      factor        42        0     1.000
## [1] "No missing observations noted."
##################################
# Formulating a data quality assessment summary for response variables (numeric)
##################################

if (length(names(DQA.Variables.Response))>0) {
  
  ##################################
  # Formulating a function to determine the first mode
  ##################################
  FirstModes <- function(x) {
    ux <- unique(na.omit(x))
    tab <- tabulate(match(x, ux))
    ux[tab == max(tab)]
  }

  ##################################
  # Formulating a function to determine the second mode
  ##################################
  SecondModes <- function(x) {
    ux <- unique(na.omit(x))
    tab <- tabulate(match(x, ux))
    fm = ux[tab == max(tab)]
    sm = na.omit(x)[!(na.omit(x) %in% fm)]
    usm <- unique(sm)
    tabsm <- tabulate(match(sm, usm))
    usm[tabsm == max(tabsm)]
  }
  
  (DQA.Variables.Response.Summary <- data.frame(
  Column.Name= names(DQA.Variables.Response), 
  Column.Type=sapply(DQA.Variables.Response, function(x) class(x)), 
  Unique.Count=sapply(DQA.Variables.Response, function(x) length(unique(x))),
  Unique.Count.Ratio=sapply(DQA.Variables.Response, function(x) format(round((length(unique(x))/nrow(DQA.Variables.Response)),3), nsmall=3)),
  First.Mode.Value=sapply(DQA.Variables.Response, function(x) format(round((FirstModes(x)[1]),3),nsmall=3)),
  Second.Mode.Value=sapply(DQA.Variables.Response, function(x) format(round((SecondModes(x)[1]),3),nsmall=3)),
  First.Mode.Count=sapply(DQA.Variables.Response, function(x) sum(na.omit(x) == FirstModes(x)[1])),
  Second.Mode.Count=sapply(DQA.Variables.Response, function(x) sum(na.omit(x) == SecondModes(x)[1])),
  First.Second.Mode.Ratio=sapply(DQA.Variables.Response, function(x) format(round((sum(na.omit(x) == FirstModes(x)[1])/sum(na.omit(x) == SecondModes(x)[1])),3), nsmall=3)),
  Minimum=sapply(DQA.Variables.Response, function(x) format(round(min(x,na.rm = TRUE),3), nsmall=3)),
  Mean=sapply(DQA.Variables.Response, function(x) format(round(mean(x,na.rm = TRUE),3), nsmall=3)),
  Median=sapply(DQA.Variables.Response, function(x) format(round(median(x,na.rm = TRUE),3), nsmall=3)),
  Maximum=sapply(DQA.Variables.Response, function(x) format(round(max(x,na.rm = TRUE),3), nsmall=3)),
  Skewness=sapply(DQA.Variables.Response, function(x) format(round(skewness(x,na.rm = TRUE),3), nsmall=3)),
  Kurtosis=sapply(DQA.Variables.Response, function(x) format(round(kurtosis(x,na.rm = TRUE),3), nsmall=3)),
  Percentile25th=sapply(DQA.Variables.Response, function(x) format(round(quantile(x,probs=0.25,na.rm = TRUE),3), nsmall=3)),
  Percentile75th=sapply(DQA.Variables.Response, function(x) format(round(quantile(x,probs=0.75,na.rm = TRUE),3), nsmall=3)),
  row.names=NULL)
  )  
  
}
##   Column.Name Column.Type Unique.Count Unique.Count.Ratio First.Mode.Value
## 1    DQA$Week     integer           24              0.571            6.000
##   Second.Mode.Value First.Mode.Count Second.Mode.Count First.Second.Mode.Ratio
## 1            11.000                4                 3                   1.333
##   Minimum   Mean Median Maximum Skewness Kurtosis Percentile25th Percentile75th
## 1   1.000 12.881 10.500  35.000    0.876    2.870          6.000         18.500
## [1] "No low variance numeric predictors due to high first-second mode ratio noted."
## [1] "No low variance numeric predictors due to low unique count ratio noted."
## [1] "No skewed response variables noted."

1.3 Research Question


Using the Leukemia dataset, survival analysis and modelling will be conducted to investigate the following :

[A] Research Question : What is the probability that a leukemia patient under the Treatment and Placebo groups will survive (remain in remission) after a specified number of weeks?
     [A.1] Factor variable = Group
          [A.1.1] Placebo = placebo drug treatment
          [A.1.2] New = new drug treatment
     [A.2] Response variable = Week (numeric)
     [A.3] Censoring variable = Status (factor)
          [A.3.1] 1 = event, patient went out of remission before the end of the study
          [A.3.2] 0 = censored or non-event, patient remains in remission until the end of the study, lost to follow-up, experiences a different event that makes further follow-up impossible or withdraws before the end of the study

[B] Research Question : Are there differences in survival between leukemia patients under the Treatment and Placebo groups?
     [B.1] Factor variable = Group
          [B.1.1] Placebo = placebo drug treatment
          [B.1.2] Treatment = new drug treatment
     [B.2] Response variable = Week (numeric)
     [B.3] Censoring variable = Status (factor)
          [B.3.1] 1 = event, patient went out of remission before the end of the study
          [B.3.2] 0 = censored or non-event, patient remains in remission until the end of the study, lost to follow-up, experiences a different event that makes further follow-up impossible or withdraws before the end of the study

[C] Research Question : What is the impact of the clinical characteristic WBC on the survival of leukemia patients under the Treatment and Placebo groups?
     [C.1] Factor variable = Group
          [C.1.1] Placebo = placebo drug treatment
          [C.1.2] Treatment = new drug treatment
     [C.2] Response variable = Week (numeric)
     [C.3] Censoring variable = Status (factor)
          [C.3.1] 1 = event, patient went out of remission before the end of the study
          [C.3.2] 0 = censored or non-event, patient remains in remission until the end of the study, lost to follow-up, experiences a different event that makes further follow-up impossible or withdraws before the end of the study
     [C.4] Covariate variable = WBC (numeric) = logarithm-transformed white blood cell measurement data gathered before the study which is also considered an important survival predictor in leukemia patients

1.4 Data Exploration


Exploratory and comparative analyses of the response variable Week by factor variable Group showed that :

[A] Differences in the proportion of censored and uncensored data (p<0.001, Pearson’s Chi-squared Test) observed between Group=Treatment (censored=57%, uncensored=43%) and Group=Placebo (censored=0%, uncensored=100%).

[B] With both censored and uncensored data considered, higher median survival time Week (p=0.004, Wilcoxon Rank Sum Test) was observed for Group=Treatment (16) as compared to Group=Placebo (8). Lower median white blood cell measurement WBC (p=0.047, Wilcoxon Rank Sum Test) was also noted for Group=Treatment (2.57) as compared to Group=Placebo (3.06).

[C] With only the uncensored considered, higher mean survival time Week observed for Group=Treatment (12.11) as compared to Group=Placebo (8.67). Lower mean hazard rate was also noted for Group=Treatment (2.51) as compared to Group=Placebo (11.53).

Kaplan-Meier survival curve analysis of the response variable Week by factor variable Group (with censoring variable Status considered) showed that :

[A] Differences in the Kaplan-Meier survival curves (p<0.0001, Log Rank Test) were noted between Group=Treatment (median survival time=23) and Group=Placebo (median survival time=8).

[B] Higher survival time was consistently observed for leukemia patients under Group=Treatment as compared to those under Group=Placebo at various sampled durations including Week=5 (Treatment=100%, Placebo=57%), Week=10 (Treatment=75%, Placebo=38%), Week=15 (Treatment=69%, Placebo=14%), Week=20 (Treatment=63%, Placebo=10%) and Week=23 (Treatment=45%, Placebo=0%).

1.4.1 Descriptive Statistics

##       Group Week Status  WBC StatusLevel
## 1 Treatment    6      1 2.31           1
## 2 Treatment    6      1 4.06           1
## 3 Treatment    6      1 3.28           1
## 4 Treatment    7      1 4.43           1
## 5 Treatment   10      1 2.96           1
## 6 Treatment   13      1 2.88           1
Characteristic Overall, N = 421 Placebo, N = 211 Treatment, N = 211 p-value2
Week 0.004
N 42 21 21
Mean (SD) 13 (9) 9 (6) 17 (10)
Median (IQR) 10 (6, 18) 8 (4, 12) 16 (9, 23)
Range 1, 35 1, 23 6, 35
StatusLevel <0.001
0 12 (29%) 0 (0%) 12 (57%)
1 30 (71%) 21 (100%) 9 (43%)
WBC 0.047
N 42 21 21
Mean (SD) 2.93 (0.92) 3.22 (0.97) 2.64 (0.77)
Median (IQR) 2.80 (2.30, 3.49) 3.06 (2.42, 3.97) 2.57 (2.16, 2.96)
Range 1.45, 5.00 1.50, 5.00 1.45, 4.43
1 n (%)
2 Wilcoxon rank sum test; Pearson's Chi-squared test
Group Mean.Uncensored.Survival.Time Total.Uncensored.Survival.Time Total.Survival.Time Mean.Hazard.Rate
Placebo 8.666667 21 182 11.538462
Treatment 12.111111 9 359 2.506964

1.4.2 Kaplan-Meier Survival Curve

##       Group Week Status  WBC StatusLevel
## 1 Treatment    6      1 2.31           1
## 2 Treatment    6      1 4.06           1
## 3 Treatment    6      1 3.28           1
## 4 Treatment    7      1 4.43           1
## 5 Treatment   10      1 2.96           1
## 6 Treatment   13      1 2.88           1
## Call: survfit(formula = Surv(Week, Status) ~ 1, data = DBP.Analysis)
## 
##       n  events  median 0.95LCL 0.95UCL 
##      42      30      12       8      22
Characteristic Week 5 Week 10 Week 15 Week 20 Week 23
Overall 79% (67%, 92%) 57% (43%, 74%) 40% (27%, 59%) 34% (22%, 53%) 19% (9.1%, 39%)
## Call: survfit(formula = Surv(Week, Status) ~ 1, data = DBP.Analysis)
## 
##  time n.risk n.event survival std.err lower 95% CI upper 95% CI
##     1     42       2    0.952  0.0329       0.8901        1.000
##     2     40       2    0.905  0.0453       0.8202        0.998
##     3     38       1    0.881  0.0500       0.7883        0.985
##     4     37       2    0.833  0.0575       0.7279        0.954
##     5     35       2    0.786  0.0633       0.6709        0.920
##     6     33       3    0.714  0.0697       0.5899        0.865
##     7     29       1    0.690  0.0715       0.5628        0.845
##     8     28       4    0.591  0.0764       0.4588        0.762
##    10     23       1    0.565  0.0773       0.4325        0.739
##    11     21       2    0.512  0.0788       0.3783        0.692
##    12     18       2    0.455  0.0796       0.3227        0.641
##    13     16       1    0.426  0.0795       0.2958        0.615
##    15     15       1    0.398  0.0791       0.2694        0.588
##    16     14       1    0.369  0.0784       0.2437        0.560
##    17     13       1    0.341  0.0774       0.2186        0.532
##    22      9       2    0.265  0.0765       0.1507        0.467
##    23      7       2    0.189  0.0710       0.0909        0.395
##    records      n.max    n.start     events     *rmean *se(rmean)     median 
##  42.000000  42.000000  42.000000  30.000000  15.339334   1.860218  12.000000 
##    0.95LCL    0.95UCL 
##   8.000000  22.000000
##    time n.risk n.event n.censor      surv    std.err     upper      lower
## 1     1     42       2        0 0.9523810 0.03450328 1.0000000 0.89010544
## 2     2     40       2        0 0.9047619 0.05006262 0.9980394 0.82020220
## 3     3     38       1        0 0.8809524 0.05672304 0.9845441 0.78826037
## 4     4     37       2        0 0.8333333 0.06900656 0.9540195 0.72791433
## 5     5     35       2        0 0.7857143 0.08058230 0.9201453 0.67092330
## 6     6     33       3        1 0.7142857 0.09759001 0.8648499 0.58993369
## 7     7     29       1        0 0.6896552 0.10370794 0.8451005 0.56280201
## 8     8     28       4        0 0.5911330 0.12925834 0.7615705 0.45883899
## 9     9     24       0        1 0.5911330 0.12925834 0.7615705 0.45883899
## 10   10     23       1        1 0.5654316 0.13668944 0.7391461 0.43254350
## 11   11     21       2        1 0.5115809 0.15393678 0.6917443 0.37834076
## 12   12     18       2        0 0.4547386 0.17504565 0.6408567 0.32267306
## 13   13     16       1        0 0.4263175 0.18656807 0.6145258 0.29575091
## 14   15     15       1        0 0.3978963 0.19892096 0.5876134 0.26943131
## 15   16     14       1        0 0.3694751 0.21228296 0.5601196 0.24371913
## 16   17     13       1        1 0.3410540 0.22687951 0.5320388 0.21862656
## 17   19     11       0        1 0.3410540 0.22687951 0.5320388 0.21862656
## 18   20     10       0        1 0.3410540 0.22687951 0.5320388 0.21862656
## 19   22      9       2        0 0.2652642 0.28847936 0.4669095 0.15070392
## 20   23      7       2        0 0.1894744 0.37465077 0.3948698 0.09091746
## 21   25      5       0        1 0.1894744 0.37465077 0.3948698 0.09091746
## 22   32      4       0        2 0.1894744 0.37465077 0.3948698 0.09091746
## 23   34      2       0        1 0.1894744 0.37465077 0.3948698 0.09091746
## 24   35      1       0        1 0.1894744 0.37465077 0.3948698 0.09091746

## Call: survfit(formula = Surv(Week, Status) ~ Group, data = DBP.Analysis)
## 
##                  n events median 0.95LCL 0.95UCL
## Group=Placebo   21     21      8       4      12
## Group=Treatment 21      9     23      16      NA
Characteristic Week 5 Week 10 Week 15 Week 20 Week 23
Group
Placebo 57% (39%, 83%) 38% (22%, 66%) 14% (5.0%, 41%) 9.5% (2.5%, 36%) 0% (0.7%, 32%)
Treatment 100% (100%, 100%) 75% (59%, 97%) 69% (51%, 93%) 63% (44%, 90%) 45% (25%, 81%)
## Call: survfit(formula = Surv(Week, Status) ~ Group, data = DBP.Analysis)
## 
##                 Group=Placebo 
##  time n.risk n.event survival std.err lower 95% CI upper 95% CI
##     1     21       2   0.9048  0.0641      0.78754        1.000
##     2     19       2   0.8095  0.0857      0.65785        0.996
##     3     17       1   0.7619  0.0929      0.59988        0.968
##     4     16       2   0.6667  0.1029      0.49268        0.902
##     5     14       2   0.5714  0.1080      0.39455        0.828
##     8     12       4   0.3810  0.1060      0.22085        0.657
##    11      8       2   0.2857  0.0986      0.14529        0.562
##    12      6       2   0.1905  0.0857      0.07887        0.460
##    15      4       1   0.1429  0.0764      0.05011        0.407
##    17      3       1   0.0952  0.0641      0.02549        0.356
##    22      2       1   0.0476  0.0465      0.00703        0.322
##    23      1       1   0.0000     NaN           NA           NA
## 
##                 Group=Treatment 
##  time n.risk n.event survival std.err lower 95% CI upper 95% CI
##     6     21       3    0.857  0.0764        0.720        1.000
##     7     17       1    0.807  0.0869        0.653        0.996
##    10     15       1    0.753  0.0963        0.586        0.968
##    13     12       1    0.690  0.1068        0.510        0.935
##    16     11       1    0.627  0.1141        0.439        0.896
##    22      7       1    0.538  0.1282        0.337        0.858
##    23      6       1    0.448  0.1346        0.249        0.807
##                 records n.max n.start events    *rmean *se(rmean) median
## Group=Placebo        21    21      21     21  8.666667   1.377390      8
## Group=Treatment      21    21      21      9 23.287395   2.827468     23
##                 0.95LCL 0.95UCL
## Group=Placebo         4      12
## Group=Treatment      16      NA
##    time n.risk n.event n.censor       surv    std.err     upper      lower
## 1     1     21       2        0 0.90476190 0.07079923 1.0000000 0.78753505
## 2     2     19       2        0 0.80952381 0.10585122 0.9961629 0.65785306
## 3     3     17       1        0 0.76190476 0.12198751 0.9676909 0.59988048
## 4     4     16       2        0 0.66666667 0.15430335 0.9020944 0.49268063
## 5     5     14       2        0 0.57142857 0.18898224 0.8276066 0.39454812
## 6     8     12       4        0 0.38095238 0.27817432 0.6571327 0.22084536
## 7    11      8       2        0 0.28571429 0.34503278 0.5618552 0.14529127
## 8    12      6       2        0 0.19047619 0.44986771 0.4600116 0.07887014
## 9    15      4       1        0 0.14285714 0.53452248 0.4072755 0.05010898
## 10   17      3       1        0 0.09523810 0.67259271 0.3558956 0.02548583
## 11   22      2       1        0 0.04761905 0.97590007 0.3224544 0.00703223
## 12   23      1       1        0 0.00000000        Inf        NA         NA
## 13    6     21       3        1 0.85714286 0.08908708 1.0000000 0.71981708
## 14    7     17       1        0 0.80672269 0.10776353 0.9964437 0.65312422
## 15    9     16       0        1 0.80672269 0.10776353 0.9964437 0.65312422
## 16   10     15       1        1 0.75294118 0.12796438 0.9675748 0.58591898
## 17   11     13       0        1 0.75294118 0.12796438 0.9675748 0.58591898
## 18   13     12       1        0 0.69019608 0.15475995 0.9347692 0.50961310
## 19   16     11       1        0 0.62745098 0.18177335 0.8959949 0.43939392
## 20   17     10       0        1 0.62745098 0.18177335 0.8959949 0.43939392
## 21   19      9       0        1 0.62745098 0.18177335 0.8959949 0.43939392
## 22   20      8       0        1 0.62745098 0.18177335 0.8959949 0.43939392
## 23   22      7       1        0 0.53781513 0.23843463 0.8582008 0.33703662
## 24   23      6       1        0 0.44817927 0.30030719 0.8073720 0.24878823
## 25   25      5       0        1 0.44817927 0.30030719 0.8073720 0.24878823
## 26   32      4       0        2 0.44817927 0.30030719 0.8073720 0.24878823
## 27   34      2       0        1 0.44817927 0.30030719 0.8073720 0.24878823
## 28   35      1       0        1 0.44817927 0.30030719 0.8073720 0.24878823
##             strata     Group
## 1    Group=Placebo   Placebo
## 2    Group=Placebo   Placebo
## 3    Group=Placebo   Placebo
## 4    Group=Placebo   Placebo
## 5    Group=Placebo   Placebo
## 6    Group=Placebo   Placebo
## 7    Group=Placebo   Placebo
## 8    Group=Placebo   Placebo
## 9    Group=Placebo   Placebo
## 10   Group=Placebo   Placebo
## 11   Group=Placebo   Placebo
## 12   Group=Placebo   Placebo
## 13 Group=Treatment Treatment
## 14 Group=Treatment Treatment
## 15 Group=Treatment Treatment
## 16 Group=Treatment Treatment
## 17 Group=Treatment Treatment
## 18 Group=Treatment Treatment
## 19 Group=Treatment Treatment
## 20 Group=Treatment Treatment
## 21 Group=Treatment Treatment
## 22 Group=Treatment Treatment
## 23 Group=Treatment Treatment
## 24 Group=Treatment Treatment
## 25 Group=Treatment Treatment
## 26 Group=Treatment Treatment
## 27 Group=Treatment Treatment
## 28 Group=Treatment Treatment

1.5 Model Development


Cox proportional hazards regression analysis involving the response variable Week, factor variable Group and covariate variable WBC showed that :

[A] A cox proportional hazards regression model applied on the response variable Week and factor variable Group which includes the covariate variable WBC was most appropriate as the final survival model since its addition significantly improved model fit (p<0.001, Likelihood Ratio Test). However, the further inclusion of an interaction term Group * WBC into the model did not show any significant improvement to model fit (p=0.549, Likelihood Ratio Test).

[B] Using the final survival model controlled for WBC, association between Group and survival time Week was statistically significant with an estimated hazard ratio of 0.25 (95% CI 0.11, 0.57; p=0.001, Cox Proportional Hazards Model Wald Test) for Group=Treatment (lower hazard and higher survival rate for leukemia patients using the new treatment) relative to Group=Placebo (higher hazard and lower survival rate for leukemia patients using the placebo treatment).

[C] Using the final survival model controlled for Group, association between WBC and survival time Week was statistically significant with an estimated hazard ratio of 5.42 (95% CI 2.81, 10.50; p<0.001, Cox Proportional Hazards Model Wald Test) indicating higher hazard and lower survival rate for leukemia patients with higher white blood count measurement.

[D] The estimated Harrel’s concordance index (proportion of leukemia patients that the model can correctly order in terms of survival times when adjusted for censoring) for the final survival model was determined to be relatively high at 0.85 (optimistic) and 0.84 (optimism-corrected after a 500-cycle internal bootstrap validation).

[E] The scaled Schoenfield residuals showed random patterns when plotted against time indicating no violation of the proportional hazards assumption. The proportional hazards assumption was further supported by the non-significant relationship between the scaled Schoenfield residuals and time based from the global (p=0.9801) and individual (p=0.9927 for Group, 0.8414 for WBC ) statistical test results.

[F] The deviance residuals did not show any severely influential observations when plotted against the observations, the dfbeta residuals were reasonably symmetrical when plotted against the observations, and the fitted line with lowess function of the martingale residuals obtained for the null cox model was reasonably linear for the numeric covariate WBC - all indicating a fairly good model fit for the final survival model.

1.5.1 Univariate Cox Proportional Hazards Model

## Call:
## coxph(formula = Surv(Week, Status) ~ Group, data = DBP.Analysis)
## 
##                   coef exp(coef) se(coef)      z        p
## GroupTreatment -1.5721    0.2076   0.4124 -3.812 0.000138
## 
## Likelihood ratio test=16.35  on 1 df, p=5.261e-05
## n= 42, number of events= 30
## Call:
## coxph(formula = Surv(Week, Status) ~ Group, data = DBP.Analysis)
## 
##   n= 42, number of events= 30 
## 
##                   coef exp(coef) se(coef)      z Pr(>|z|)    
## GroupTreatment -1.5721    0.2076   0.4124 -3.812 0.000138 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
##                exp(coef) exp(-coef) lower .95 upper .95
## GroupTreatment    0.2076      4.817   0.09251    0.4659
## 
## Concordance= 0.69  (se = 0.041 )
## Likelihood ratio test= 16.35  on 1 df,   p=5e-05
## Wald test            = 14.53  on 1 df,   p=1e-04
## Score (logrank) test = 17.25  on 1 df,   p=3e-05
Characteristic HR1 95% CI1 p-value
Group
Placebo
Treatment 0.21 0.09, 0.47 <0.001
1 HR = Hazard Ratio, CI = Confidence Interval

## Call:
## coxph(formula = Surv(Week, Status) ~ WBC, data = DBP.Analysis)
## 
##      coef exp(coef) se(coef)     z        p
## WBC 1.646     5.189    0.298 5.525 3.29e-08
## 
## Likelihood ratio test=34.84  on 1 df, p=3.577e-09
## n= 42, number of events= 30
## Call:
## coxph(formula = Surv(Week, Status) ~ WBC, data = DBP.Analysis)
## 
##   n= 42, number of events= 30 
## 
##      coef exp(coef) se(coef)     z Pr(>|z|)    
## WBC 1.646     5.189    0.298 5.525 3.29e-08 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
##     exp(coef) exp(-coef) lower .95 upper .95
## WBC     5.188     0.1927     2.893     9.304
## 
## Concordance= 0.809  (se = 0.045 )
## Likelihood ratio test= 34.84  on 1 df,   p=4e-09
## Wald test            = 30.53  on 1 df,   p=3e-08
## Score (logrank) test = 36.62  on 1 df,   p=1e-09
Characteristic HR1 95% CI1 p-value
WBC 5.19 2.89, 9.30 <0.001
1 HR = Hazard Ratio, CI = Confidence Interval

Characteristic N Event N Median Survival
Overall 42 30 12 (8.0, 22)
Group 42 30
Placebo 8.0 (4.0, 12)
Treatment 23 (16, —)
WBC 42 30
1.45 — (—, —)
1.47 — (—, —)
1.5 12 (—, —)
1.78 — (—, —)
1.97 23 (—, —)
2.01 — (—, —)
2.05 — (—, —)
2.12 11 (—, —)
2.16 — (—, —)
2.2 — (—, —)
2.3 15 (—, —)
2.31 6.0 (—, —)
2.32 15 (8.0, —)
2.42 4.0 (—, —)
2.53 — (—, —)
2.57 23 (—, —)
2.6 — (—, —)
2.7 — (—, —)
2.73 22 (—, —)
2.8 5.0 (1.0, —)
2.88 13 (—, —)
2.95 17 (—, —)
2.96 10 (—, —)
3.05 8.0 (—, —)
3.06 12 (—, —)
3.2 — (—, —)
3.26 8.0 (—, —)
3.28 6.0 (—, —)
3.49 8.0 (5.0, —)
3.52 8.0 (—, —)
3.6 16 (—, —)
3.97 5.0 (—, —)
4.01 3.0 (—, —)
4.06 6.0 (—, —)
4.36 4.0 (—, —)
4.43 7.0 (—, —)
4.48 2.0 (—, —)
4.91 2.0 (—, —)
5 1.0 (—, —)

1.5.2 Multivariate Cox Proportional Hazards Model

## Call:
## coxph(formula = Surv(Week, Status) ~ Group + WBC, data = DBP.Analysis)
## 
##                   coef exp(coef) se(coef)      z       p
## GroupTreatment -1.3861    0.2501   0.4248 -3.263  0.0011
## WBC             1.6909    5.4243   0.3359  5.034 4.8e-07
## 
## Likelihood ratio test=46.71  on 2 df, p=7.187e-11
## n= 42, number of events= 30
## Call:
## coxph(formula = Surv(Week, Status) ~ Group + WBC, data = DBP.Analysis)
## 
##   n= 42, number of events= 30 
## 
##                   coef exp(coef) se(coef)      z Pr(>|z|)    
## GroupTreatment -1.3861    0.2501   0.4248 -3.263   0.0011 ** 
## WBC             1.6909    5.4243   0.3359  5.034  4.8e-07 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
##                exp(coef) exp(-coef) lower .95 upper .95
## GroupTreatment    0.2501     3.9991    0.1088    0.5749
## WBC               5.4243     0.1844    2.8082   10.4776
## 
## Concordance= 0.852  (se = 0.04 )
## Likelihood ratio test= 46.71  on 2 df,   p=7e-11
## Wald test            = 33.6  on 2 df,   p=5e-08
## Score (logrank) test = 46.07  on 2 df,   p=1e-10
Characteristic HR1 95% CI1 p-value
Group
Placebo
Treatment 0.25 0.11, 0.57 0.001
WBC 5.42 2.81, 10.5 <0.001
1 HR = Hazard Ratio, CI = Confidence Interval

## Call:
## coxph(formula = Surv(Week, Status) ~ Group + WBC + Group * WBC, 
##     data = DBP.Analysis)
## 
##                        coef exp(coef) se(coef)      z        p
## GroupTreatment     -2.37491   0.09302  1.70547 -1.393    0.164
## WBC                 1.55489   4.73459  0.39866  3.900 9.61e-05
## GroupTreatment:WBC  0.31752   1.37372  0.52579  0.604    0.546
## 
## Likelihood ratio test=47.07  on 3 df, p=3.356e-10
## n= 42, number of events= 30
## Call:
## coxph(formula = Surv(Week, Status) ~ Group + WBC + Group * WBC, 
##     data = DBP.Analysis)
## 
##   n= 42, number of events= 30 
## 
##                        coef exp(coef) se(coef)      z Pr(>|z|)    
## GroupTreatment     -2.37491   0.09302  1.70547 -1.393    0.164    
## WBC                 1.55489   4.73459  0.39866  3.900 9.61e-05 ***
## GroupTreatment:WBC  0.31752   1.37372  0.52579  0.604    0.546    
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
##                    exp(coef) exp(-coef) lower .95 upper .95
## GroupTreatment       0.09302    10.7500  0.003288     2.632
## WBC                  4.73459     0.2112  2.167413    10.342
## GroupTreatment:WBC   1.37372     0.7280  0.490169     3.850
## 
## Concordance= 0.851  (se = 0.041 )
## Likelihood ratio test= 47.07  on 3 df,   p=3e-10
## Wald test            = 32.39  on 3 df,   p=4e-07
## Score (logrank) test = 49.86  on 3 df,   p=9e-11
Characteristic HR1 95% CI1 p-value
Group
Placebo
Treatment 0.09 0.00, 2.63 0.2
WBC 4.73 2.17, 10.3 <0.001
Group * WBC
Treatment * WBC 1.37 0.49, 3.85 0.5
1 HR = Hazard Ratio, CI = Confidence Interval

Overall Survival HR (univariable) HR (multivariable) HR (multivariable reduced)
Group Placebo 21 (50.0) - - -
Treatment 21 (50.0) 0.21 (0.09-0.47, p<0.001) 0.25 (0.11-0.57, p=0.001) 0.25 (0.11-0.57, p=0.001)
WBC Mean (SD) 2.9 (0.9) 5.19 (2.89-9.30, p<0.001) 5.42 (2.81-10.48, p<0.001) 5.42 (2.81-10.48, p<0.001)

1.5.3 Model Selection

## Call:
## coxph(formula = Surv(Week, Status) ~ Group, data = DBP.Analysis)
## 
##   n= 42, number of events= 30 
## 
##                   coef exp(coef) se(coef)      z Pr(>|z|)    
## GroupTreatment -1.5721    0.2076   0.4124 -3.812 0.000138 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
##                exp(coef) exp(-coef) lower .95 upper .95
## GroupTreatment    0.2076      4.817   0.09251    0.4659
## 
## Concordance= 0.69  (se = 0.041 )
## Likelihood ratio test= 16.35  on 1 df,   p=5e-05
## Wald test            = 14.53  on 1 df,   p=1e-04
## Score (logrank) test = 17.25  on 1 df,   p=3e-05
## Call:
## coxph(formula = Surv(Week, Status) ~ Group + WBC, data = DBP.Analysis)
## 
##   n= 42, number of events= 30 
## 
##                   coef exp(coef) se(coef)      z Pr(>|z|)    
## GroupTreatment -1.3861    0.2501   0.4248 -3.263   0.0011 ** 
## WBC             1.6909    5.4243   0.3359  5.034  4.8e-07 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
##                exp(coef) exp(-coef) lower .95 upper .95
## GroupTreatment    0.2501     3.9991    0.1088    0.5749
## WBC               5.4243     0.1844    2.8082   10.4776
## 
## Concordance= 0.852  (se = 0.04 )
## Likelihood ratio test= 46.71  on 2 df,   p=7e-11
## Wald test            = 33.6  on 2 df,   p=5e-08
## Score (logrank) test = 46.07  on 2 df,   p=1e-10
## Call:
## coxph(formula = Surv(Week, Status) ~ Group + WBC + Group * WBC, 
##     data = DBP.Analysis)
## 
##   n= 42, number of events= 30 
## 
##                        coef exp(coef) se(coef)      z Pr(>|z|)    
## GroupTreatment     -2.37491   0.09302  1.70547 -1.393    0.164    
## WBC                 1.55489   4.73459  0.39866  3.900 9.61e-05 ***
## GroupTreatment:WBC  0.31752   1.37372  0.52579  0.604    0.546    
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
##                    exp(coef) exp(-coef) lower .95 upper .95
## GroupTreatment       0.09302    10.7500  0.003288     2.632
## WBC                  4.73459     0.2112  2.167413    10.342
## GroupTreatment:WBC   1.37372     0.7280  0.490169     3.850
## 
## Concordance= 0.851  (se = 0.041 )
## Likelihood ratio test= 47.07  on 3 df,   p=3e-10
## Wald test            = 32.39  on 3 df,   p=4e-07
## Score (logrank) test = 49.86  on 3 df,   p=9e-11
## Analysis of Deviance Table
##  Cox model: response is  Surv(Week, Status)
##  Model 1: ~ Group
##  Model 2: ~ Group + WBC
##    loglik  Chisq Df P(>|Chi|)    
## 1 -85.008                        
## 2 -69.828 30.361  1 3.587e-08 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## Analysis of Deviance Table
##  Cox model: response is  Surv(Week, Status)
##  Model 1: ~ Group + WBC
##  Model 2: ~ Group + WBC + Group * WBC
##    loglik  Chisq Df P(>|Chi|)
## 1 -69.828                    
## 2 -69.648 0.3594  1    0.5488

1.5.4 Model Performance Validation

## 
## Somers' Rank Correlation for Censored Data    Response variable:Surv(Week, Status)
## 
##                                C  Dxy aDxy    SD    Z P  n
## I(-1 * LP.Univariate.Group) 0.69 0.38 0.38 0.082 4.62 0 42
## concordant 
##  0.6900421
## [1] 0.6900421
## [1] 0.6894831
## 
## Somers' Rank Correlation for Censored Data    Response variable:Surv(Week, Status)
## 
##                                      C   Dxy  aDxy    SD    Z P  n
## I(-1 * LP.Multivariate.GroupWBC) 0.852 0.704 0.704 0.081 8.73 0 42
## concordant 
##  0.8520337
## [1] 0.8520337
## [1] 0.8448274
## 
## Somers' Rank Correlation for Censored Data    Response variable:Surv(Week, Status)
## 
##                                                      C   Dxy  aDxy    SD    Z P
## I(-1 * LP.Multivariate.GroupWBC.WithInteraction) 0.851 0.701 0.701 0.082 8.57 0
##                                                   n
## I(-1 * LP.Multivariate.GroupWBC.WithInteraction) 42
## concordant 
##  0.8506311
## [1] 0.8506311
## [1] 0.8406148

1.5.5 Model Diagnostics

## Call:
## coxph(formula = Surv(Week, Status) ~ Group + WBC, data = DBP.Analysis)
## 
##   n= 42, number of events= 30 
## 
##                   coef exp(coef) se(coef)      z Pr(>|z|)    
## GroupTreatment -1.3861    0.2501   0.4248 -3.263   0.0011 ** 
## WBC             1.6909    5.4243   0.3359  5.034  4.8e-07 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
##                exp(coef) exp(-coef) lower .95 upper .95
## GroupTreatment    0.2501     3.9991    0.1088    0.5749
## WBC               5.4243     0.1844    2.8082   10.4776
## 
## Concordance= 0.852  (se = 0.04 )
## Likelihood ratio test= 46.71  on 2 df,   p=7e-11
## Wald test            = 33.6  on 2 df,   p=5e-08
## Score (logrank) test = 46.07  on 2 df,   p=1e-10
Characteristic HR1 95% CI1 p-value
Group
Placebo
Treatment 0.25 0.11, 0.57 0.001
WBC 5.42 2.81, 10.5 <0.001
1 HR = Hazard Ratio, CI = Confidence Interval

##       Group      WBC
## 1   Placebo 2.930238
## 2 Treatment 2.930238

##           chisq df    p
## Group  8.27e-05  1 0.99
## WBC    4.00e-02  1 0.84
## GLOBAL 4.02e-02  2 0.98

1.6 References


[Book] Survival Analysis : A Self-Learning Text by David Kleinbaum and Mitchel Klein
[Book] Applied Survival Analysis Using R by Dirk Moore
[Book] R for Health Data Science by Ewen Harrison and Riinu Pius
[Book] Supervised Machine Learning by Michael Foley
[Book] Data Science for Biological, Medical and Health Research by Thomas Love
[Article] Survival Analysis by Alboukadel Kassambara
[Article] Survival Analysis in R by Emily Zabor
[Article] Survival Analysis with R by Joseph Rickert
[Article] Assessment of Discrimination in Survival Analysis by R-Studio Team
[Article] Cox Model Assumptions by Alboukadel Kassambara
[Publication] Survival Analysis Part I: Basic Concepts and First Analyses by Taane Clark (British Journal of Cancer)
[Publication] Survival Analysis Part II: Multivariate Data Analysis – An Introduction to Concepts and Methods by Mike Bradburn (British Journal of Cancer)
[Publication] Survival Analysis Part III: Multivariate Data Analysis – Choosing a Model and Assessing its Adequacy and Fit by Mike Bradburn (British Journal of Cancer)
[Publication] Survival Analysis Part IV: Further Concepts and Methods in Survival Analysis by Taane Clark (British Journal of Cancer)
[Tutorial] Survival Analysis / Time-To-Event Analysis in R by Heidi Seibold (Datacamp)
[R Package] moments by Lukasz Komsta and Frederick Novomestky
[R Package] car by John Fox and Sanford Weisberg
[R Package] multcomp by Torsten Hothorn, Frank Bretz and Peter Westfall
[R Package] effects by John Fox and Sanford Weisberg
[R Package] psych by William Revelle
[R Package] ggplot2 by Hadley Wickham
[R Package] dplyr by Hadley Wickham
[R Package] ggpubr by Alboukadel Kassambara
[R Package] rstatix by Alboukadel Kassambara
[R Package] ggfortify by Yuan Tang
[R Package] trend by Thorsten Pohlert
[R Package] survival by Terry Therneau
[R Package] rms by Frank Harrell
[R Package] survminer by Alboukadel Kassambara
[R Package] Hmisc by Frank Harrel
[R Package] finalfit by Ewen Harrison
[R Package] knitr by Yihui Xie
[R Package] gtsummary by Daniel Sjoberg