This case study is conducted to determine the survival of colon cancer patients between different treatment groups. The association between explanatory variables and time to recur cancer, which is the end point. The 3 treatment groups which were studied were observation, Levamisole, and Levamisole + 5-FU. The data was extracted from a study of an adjuvant chemotherapy for stage B/C colon cancer.

Variables in the dataset:

id: id

study: 1 for all patients

rx: Treatment - 1 = Obs(ervation), 2 = Lev(amisole), 3 = Lev(amisole)+5-FU

sex: (1 = male, 0 = other)

age: in years

obstruct: obstruction of colon by tumour (1 = yes, 0 = no)

perfor: perforation of colon (1 = yes, 0 = no)

adhere: adherence to nearby organs (1 = yes, 0 = no)

nodes: number of lymph nodes with detectable cancer

time: days until event or censoring

status: censoring status (1 = event, 0 = censored)

differ: differentiation of tumour (1 = well, 2 = moderate, 3 = poor)

extent: Extent of local spread (1 = submucosa, 2 = muscle, 3 = serosa, 4 = contiguous structures)

surg: time from surgery to registration (0 = short, 1 = long)

Data Management

This process is done to maintain homogeneity of variables when analyzed. Some of the variables are recoded for more clarity during data interpretation. A new variable, stage, is created using the Duke’s classification (from the Bowel Cancer UK website) for colon cancer into stages B and C (stages A and D are not included in this study).

Data Exploration

a) Number of observations, variables & censor

Censoring: For censored participants in this study, the causes of censorship are unknown. The survival curve after the censored event is not shown in the plot.

Status Frequency Proportion
Event 468 0.5
Censored 461 0.5

In this dataset, the number of observations is 929, and variables is 15. The number of events occured was 468 (50%) and censored was 461 (50%).

b) Histogram to observe outcome (time to recurrence of cancer) distribution

The plot shows two peaks within the distribution of the time it had taken for the cancer to recur. It is not normally distributed and has a bimodal distribution.

c) Time-independent variables

The table below shows means, standard deviations (SDs), minimum and maximum values for the time-independent variables in the dataset. The numeric variables included in the table are the age of all the participants, the time to recurrence of cancer, time, and nodes.

Age, time and nodes overall statistics

Variables Mean SD Min Max
age 59.75 11.95 18 85
nodes 3.66 3.57 0 33
time 1405.14 998.90 8 3329

Breakdown of age and time variables by treatment groups

Treatment Group Mean Age SD of Age Min Age Max Age
Observation 59.45 11.97 18 85
Levamisole 60.11 11.65 27 83
Levamisole+5-FU 59.70 12.26 26 81
Treatment Group Mean Time SD of Time Min Time Max Time
Observation 1281.24 973.71 20 3192
Levamisole 1315.89 1009.91 19 3329
Levamisole+5-FU 1624.52 980.27 8 3309

Mean age was most in the Leviamisole group (60.11 years), and least in the observation group (59.45 years). The average time to recurrence of cancer was most in Leviamisole + 5-FU group (1624.52 days), while lowest was in the observation group (1281.24).

d) Categorical Variables

Treatment groups, sex and other cancer-related data are categorical variables. Therefore, their counts are needed to be considered separately to display their totals and percentages.

Variable Categories Total Percentage
adhere No 794 85.47
adhere Yes 135 14.53
differ Moderate 663 71.37
differ Poor 150 16.15
differ Well 93 10.01
differ NA 23 2.48
extent Contiguous Structures 43 4.63
extent Muscle 106 11.41
extent Serosa 759 81.70
extent Submucosa 21 2.26
obstruct No 749 80.62
obstruct Yes 180 19.38
perfor No 902 97.09
perfor Yes 27 2.91
rx Levamisole 310 33.37
rx Levamisole+5-FU 304 32.72
rx Observation 315 33.91
sex Male 484 52.10
sex Other 445 47.90
stage stageB 2 0.22
stage stageC 909 97.85
stage NA 18 1.94
status Censored 461 49.62
status Event 468 50.38
surg Long 247 26.59
surg Short 682 73.41

The number of participants who had the event were 468 (50.38%), and censored were 461 (49.62%). In the observation group, the number of participants were 315 (33.91%), Leviamisole group had 310 (33.37%), and Leviamisole + 5-FU had 304 (32.72%).

There are 52.10% males in this dataset, and 47.90% other sexes. The time from surgery to registration was short in 682 (73.41%) and long in 247 (26.59%) of the participants. The percentage for (definitive) stage B cancer is very low, 0.22% (only 2 participants). There were higher number of NA (18) stage as the number of lymph nodes for these variables were not specified in the dataset.

Cancer-related complications like obstruction, perforation and adherence were less likely present in participants. Tumors were mostly moderately differentiated (in 71.37% of the participants), and had mostly spread to serosa (in 81.70% participants).

Proportion of sex in each treatment group

Treatment Group Sex Total Percentage
Observation Male 166 52.70
Observation Other 149 47.30
Levamisole Male 177 57.10
Levamisole Other 133 42.90
Levamisole+5-FU Male 141 46.38
Levamisole+5-FU Other 163 53.62

Most number of males were in the Leviamisole group (57.10% participants), while least were in Leviamisole + 5-FU group (46.38%).

e) Missingness of the dataset

The missingness of dataset is checked to determine the type of missingness of data which can allow the analysis to be conducted accordingly.

##     id study rx sex age obstruct perfor adhere status extent surg time nodes
## 888  1     1  1   1   1        1      1      1      1      1    1    1     1
## 23   1     1  1   1   1        1      1      1      1      1    1    1     1
## 18   1     1  1   1   1        1      1      1      1      1    1    1     0
##      0     0  0   0   0        0      0      0      0      0    0    0    18
##     differ   
## 888      1  0
## 23       0  1
## 18       1  1
##         23 41

According to the above diagram, some missingness is observed in this dataset which is very minimal, and there is no clear pattern of missingness. Therefore, the missingness can be categorized as missing completely at random (MCAR).

The number of missing data is observed to be 41.

Survival Curves

To plot survival curves in R, the survival data needs to be setup accordingly with the time and the status (censor versus event). Then the survfit function is used to fit the curve in Kaplan-Meier plot.

Kaplan-Meier Curve

The overall survival can be observed in this plot.

This plot shows the survival in all participants along with the 95% CIs (the gray area showing upper and lower bounds).

Survival by treatment groups

The curves show the survival of the participants between the 3 treatment groups.

The survival is observed to be the best in the Leviamisole + 5-FU group, whereas the observation and Leviamisole groups show less survival with similar results and the graphs overlap each other.

Checking assumptions

Complementary log-log curve

There is some crossover in the curves as they are plotted very close together. There is no major deviation or crossovers in both the survival and the complementary log-log curves. We can therefore conclude that the Cox PH assumptions were met.

Log rank test
  • H0: There is no difference in survival between treatment groups
  • Ha: There is a difference in survival between the treatment groups
Wilcoxon test
  • H0: There is no difference in survival between treatment groups
  • Ha: There is a difference in survival between treatment groups

As p-value is less than alpha (p = 1e-05) for both log rank and Wilcoxon tests, we can reject the null hypothesis and conclude that there is a difference in survival between the treatment groups.

Survival by explanatory variables

a) Sex

Though the survival curves for male and other sexes are similar, the male sex overall shows better results as observed in this plot.

b) Extent of cancer spread

The above plot clearly shows that when more areas are invaded by the tumor, worse the survival. The lesser extent of spread (only involving submucosa) of colon cancer had the best survival. Contiguous spread had the worst survival outcome.

c) Differentiation of tumor

From the above plot, it is observed that the survival for patients whose tumors are well and moderately differentiated have better survival. Their curves are closer together and therefore overlap each other. The survival for patients with poorly differentiated tumors have the worst survival.

d) Time between surgery until registration

The curve with shorter period between surgery and registration is shown to have better survival than the longer period.

Cox proportional hazards models

Cox proportional hazard models are generally used in clinical trials to find out the hazard ratios (HRs) for each treatment group. Higher the HR, lesser the survival in that group, and vice versa.

Treatment groups

The Cox PH model includes the HRs for the treatment groups. The table below shows the HRs, 95% CIs and the p-values for the Leviamisole and Leviamisole + 5-FU groups compared with the observation group.

Term Hazard Ratio Lower Bound Upper Bound P-value
factor(rx)Levamisole -0.02 -0.22 0.19 0.89
factor(rx)Levamisole+5-FU -0.51 -0.74 -0.28 0.00

From the above table, only the HR for the treatment group Levamisole + 5-FU is observed to have a positive effect on the survival (lower HR) which is statistically significant. Both the Leviamisole and the Leviamisole + 5-FU groups are compared with the observation group.

Selecting final model

The process of backwards elimination will be used to eliminate the covariates with which may have a HR which is not statistically significant, or have p-values more than 0.05. The necessary covariates like treatment group variable will remain in all the models.

The output from the final model is shown below, which excludes the variables perfor, obstruct and stage. The table below shows the HR for each variable and their sub-categories from the final Cox PH model with the 95% CIs and the p-values.

Term Hazard Ratio Lower Bound Upper Bound P-value
factor(rx)Levamisole -0.08 -0.29 0.14 0.49
factor(rx)Levamisole+5-FU -0.52 -0.76 -0.28 0.00
factor(sex)Other 0.16 -0.03 0.35 0.09
age 0.00 -0.01 0.00 0.33
factor(adhere)No -0.19 -0.44 0.07 0.15
nodes 0.08 0.06 0.10 0.00
factor(differ)Moderate -0.04 -0.36 0.28 0.81
factor(differ)Poor 0.21 -0.17 0.59 0.28
factor(extent)Muscle 0.41 -0.63 1.45 0.43
factor(extent)Serosa 0.95 -0.04 1.94 0.06
factor(extent)Contiguous Structures 1.33 0.26 2.40 0.01
factor(surg)Short -0.24 -0.45 -0.04 0.02

From the above table, it is seen that spread of cancer to contiguous structures had the highest HR (though the value is not statistically significant). Among the treatment groups, Leviamisole + 5-FU group had the lowest HR which is statistically significant. Other covariates with low HRs (that are statistically significant) are other sexes than male, nodes, and shorter time for registration into the study.

Confounding & interaction

Confounding

Hypotheses:

  • H0: FEV1 is not different within the treatment groups

  • H1: FEV1 is different within the treatment groups

Age

Term Hazard Ratio Lower Bound Upper Bound P-value
factor(rx)Levamisole -0.01 -0.22 0.20 0.90
factor(rx)Levamisole+5-FU -0.51 -0.74 -0.28 0.00
age -0.01 -0.01 0.00 0.11

Sex

Term Hazard Ratio Lower Bound Upper Bound P-value
factor(rx)Levamisole -0.01 -0.22 0.20 0.92
factor(rx)Levamisole+5-FU -0.52 -0.75 -0.28 0.00
sexOther 0.11 -0.08 0.29 0.25

According to the values in the above tables, there are no significant changes in values of the estimates and 95% CIs. Therefore, confounding is not seen for age and sex.

Interaction

Interaction for time from surgery to registration is checked for this model. The purpose of this test is to observe any effect of sex on the final results.

Term Hazard Ratio Lower Bound Upper Bound P-value
factor(rx)Levamisole 0.13 -0.26 0.51 0.51
factor(rx)Levamisole+5-FU -0.26 -0.68 0.16 0.23
surgShort -0.08 -0.40 0.24 0.62
factor(rx)Levamisole:surgShort -0.19 -0.65 0.27 0.41
factor(rx)Levamisole+5-FU:surgShort -0.34 -0.85 0.16 0.18

There is no statistically significant interaction between the treatment groups and time from surgery to registration.