This case study is conducted to determine the survival of colon cancer patients between different treatment groups. The association between explanatory variables and time to recur cancer, which is the end point. The 3 treatment groups which were studied were observation, Levamisole, and Levamisole + 5-FU. The data was extracted from a study of an adjuvant chemotherapy for stage B/C colon cancer.
Variables in the dataset:
id: id
study: 1 for all patients
rx: Treatment - 1 = Obs(ervation), 2 = Lev(amisole), 3 = Lev(amisole)+5-FU
sex: (1 = male, 0 = other)
age: in years
obstruct: obstruction of colon by tumour (1 = yes, 0 = no)
perfor: perforation of colon (1 = yes, 0 = no)
adhere: adherence to nearby organs (1 = yes, 0 = no)
nodes: number of lymph nodes with detectable cancer
time: days until event or censoring
status: censoring status (1 = event, 0 = censored)
differ: differentiation of tumour (1 = well, 2 = moderate, 3 = poor)
extent: Extent of local spread (1 = submucosa, 2 = muscle, 3 = serosa, 4 = contiguous structures)
surg: time from surgery to registration (0 = short, 1 = long)
This process is done to maintain homogeneity of variables when analyzed. Some of the variables are recoded for more clarity during data interpretation. A new variable, stage, is created using the Duke’s classification (from the Bowel Cancer UK website) for colon cancer into stages B and C (stages A and D are not included in this study).
Censoring: For censored participants in this study, the causes of censorship are unknown. The survival curve after the censored event is not shown in the plot.
| Status | Frequency | Proportion |
|---|---|---|
| Event | 468 | 0.5 |
| Censored | 461 | 0.5 |
In this dataset, the number of observations is 929, and variables is 15. The number of events occured was 468 (50%) and censored was 461 (50%).
The plot shows two peaks within the distribution of the time it had taken for the cancer to recur. It is not normally distributed and has a bimodal distribution.
The table below shows means, standard deviations (SDs), minimum and maximum values for the time-independent variables in the dataset. The numeric variables included in the table are the age of all the participants, the time to recurrence of cancer, time, and nodes.
Age, time and nodes overall statistics
| Variables | Mean | SD | Min | Max |
|---|---|---|---|---|
| age | 59.75 | 11.95 | 18 | 85 |
| nodes | 3.66 | 3.57 | 0 | 33 |
| time | 1405.14 | 998.90 | 8 | 3329 |
Breakdown of age and time variables by treatment groups
| Treatment Group | Mean Age | SD of Age | Min Age | Max Age |
|---|---|---|---|---|
| Observation | 59.45 | 11.97 | 18 | 85 |
| Levamisole | 60.11 | 11.65 | 27 | 83 |
| Levamisole+5-FU | 59.70 | 12.26 | 26 | 81 |
| Treatment Group | Mean Time | SD of Time | Min Time | Max Time |
|---|---|---|---|---|
| Observation | 1281.24 | 973.71 | 20 | 3192 |
| Levamisole | 1315.89 | 1009.91 | 19 | 3329 |
| Levamisole+5-FU | 1624.52 | 980.27 | 8 | 3309 |
Mean age was most in the Leviamisole group (60.11 years), and least in the observation group (59.45 years). The average time to recurrence of cancer was most in Leviamisole + 5-FU group (1624.52 days), while lowest was in the observation group (1281.24).
Treatment groups, sex and other cancer-related data are categorical variables. Therefore, their counts are needed to be considered separately to display their totals and percentages.
| Variable | Categories | Total | Percentage |
|---|---|---|---|
| adhere | No | 794 | 85.47 |
| adhere | Yes | 135 | 14.53 |
| differ | Moderate | 663 | 71.37 |
| differ | Poor | 150 | 16.15 |
| differ | Well | 93 | 10.01 |
| differ | NA | 23 | 2.48 |
| extent | Contiguous Structures | 43 | 4.63 |
| extent | Muscle | 106 | 11.41 |
| extent | Serosa | 759 | 81.70 |
| extent | Submucosa | 21 | 2.26 |
| obstruct | No | 749 | 80.62 |
| obstruct | Yes | 180 | 19.38 |
| perfor | No | 902 | 97.09 |
| perfor | Yes | 27 | 2.91 |
| rx | Levamisole | 310 | 33.37 |
| rx | Levamisole+5-FU | 304 | 32.72 |
| rx | Observation | 315 | 33.91 |
| sex | Male | 484 | 52.10 |
| sex | Other | 445 | 47.90 |
| stage | stageB | 2 | 0.22 |
| stage | stageC | 909 | 97.85 |
| stage | NA | 18 | 1.94 |
| status | Censored | 461 | 49.62 |
| status | Event | 468 | 50.38 |
| surg | Long | 247 | 26.59 |
| surg | Short | 682 | 73.41 |
The number of participants who had the event were 468 (50.38%), and censored were 461 (49.62%). In the observation group, the number of participants were 315 (33.91%), Leviamisole group had 310 (33.37%), and Leviamisole + 5-FU had 304 (32.72%).
There are 52.10% males in this dataset, and 47.90% other sexes. The time from surgery to registration was short in 682 (73.41%) and long in 247 (26.59%) of the participants. The percentage for (definitive) stage B cancer is very low, 0.22% (only 2 participants). There were higher number of NA (18) stage as the number of lymph nodes for these variables were not specified in the dataset.
Cancer-related complications like obstruction, perforation and adherence were less likely present in participants. Tumors were mostly moderately differentiated (in 71.37% of the participants), and had mostly spread to serosa (in 81.70% participants).
Proportion of sex in each treatment group
| Treatment Group | Sex | Total | Percentage |
|---|---|---|---|
| Observation | Male | 166 | 52.70 |
| Observation | Other | 149 | 47.30 |
| Levamisole | Male | 177 | 57.10 |
| Levamisole | Other | 133 | 42.90 |
| Levamisole+5-FU | Male | 141 | 46.38 |
| Levamisole+5-FU | Other | 163 | 53.62 |
Most number of males were in the Leviamisole group (57.10% participants), while least were in Leviamisole + 5-FU group (46.38%).
The missingness of dataset is checked to determine the type of missingness of data which can allow the analysis to be conducted accordingly.
## id study rx sex age obstruct perfor adhere status extent surg time nodes
## 888 1 1 1 1 1 1 1 1 1 1 1 1 1
## 23 1 1 1 1 1 1 1 1 1 1 1 1 1
## 18 1 1 1 1 1 1 1 1 1 1 1 1 0
## 0 0 0 0 0 0 0 0 0 0 0 0 18
## differ
## 888 1 0
## 23 0 1
## 18 1 1
## 23 41
According to the above diagram, some missingness is observed in this dataset which is very minimal, and there is no clear pattern of missingness. Therefore, the missingness can be categorized as missing completely at random (MCAR).
The number of missing data is observed to be 41.
To plot survival curves in R, the survival data needs to be setup accordingly with the time and the status (censor versus event). Then the survfit function is used to fit the curve in Kaplan-Meier plot.
The overall survival can be observed in this plot.
This plot shows the survival in all participants along with the 95% CIs (the gray area showing upper and lower bounds).
The curves show the survival of the participants between the 3 treatment groups.
The survival is observed to be the best in the Leviamisole + 5-FU group, whereas the observation and Leviamisole groups show less survival with similar results and the graphs overlap each other.
There is some crossover in the curves as they are plotted very close together. There is no major deviation or crossovers in both the survival and the complementary log-log curves. We can therefore conclude that the Cox PH assumptions were met.
As p-value is less than alpha (p = 1e-05) for both log rank and Wilcoxon tests, we can reject the null hypothesis and conclude that there is a difference in survival between the treatment groups.
Though the survival curves for male and other sexes are similar, the male sex overall shows better results as observed in this plot.
The above plot clearly shows that when more areas are invaded by the tumor, worse the survival. The lesser extent of spread (only involving submucosa) of colon cancer had the best survival. Contiguous spread had the worst survival outcome.
From the above plot, it is observed that the survival for patients whose tumors are well and moderately differentiated have better survival. Their curves are closer together and therefore overlap each other. The survival for patients with poorly differentiated tumors have the worst survival.
The curve with shorter period between surgery and registration is shown to have better survival than the longer period.
Cox proportional hazard models are generally used in clinical trials to find out the hazard ratios (HRs) for each treatment group. Higher the HR, lesser the survival in that group, and vice versa.
The Cox PH model includes the HRs for the treatment groups. The table below shows the HRs, 95% CIs and the p-values for the Leviamisole and Leviamisole + 5-FU groups compared with the observation group.
| Term | Hazard Ratio | Lower Bound | Upper Bound | P-value |
|---|---|---|---|---|
| factor(rx)Levamisole | -0.02 | -0.22 | 0.19 | 0.89 |
| factor(rx)Levamisole+5-FU | -0.51 | -0.74 | -0.28 | 0.00 |
From the above table, only the HR for the treatment group Levamisole + 5-FU is observed to have a positive effect on the survival (lower HR) which is statistically significant. Both the Leviamisole and the Leviamisole + 5-FU groups are compared with the observation group.
The process of backwards elimination will be used to eliminate the covariates with which may have a HR which is not statistically significant, or have p-values more than 0.05. The necessary covariates like treatment group variable will remain in all the models.
The output from the final model is shown below, which excludes the variables perfor, obstruct and stage. The table below shows the HR for each variable and their sub-categories from the final Cox PH model with the 95% CIs and the p-values.
| Term | Hazard Ratio | Lower Bound | Upper Bound | P-value |
|---|---|---|---|---|
| factor(rx)Levamisole | -0.08 | -0.29 | 0.14 | 0.49 |
| factor(rx)Levamisole+5-FU | -0.52 | -0.76 | -0.28 | 0.00 |
| factor(sex)Other | 0.16 | -0.03 | 0.35 | 0.09 |
| age | 0.00 | -0.01 | 0.00 | 0.33 |
| factor(adhere)No | -0.19 | -0.44 | 0.07 | 0.15 |
| nodes | 0.08 | 0.06 | 0.10 | 0.00 |
| factor(differ)Moderate | -0.04 | -0.36 | 0.28 | 0.81 |
| factor(differ)Poor | 0.21 | -0.17 | 0.59 | 0.28 |
| factor(extent)Muscle | 0.41 | -0.63 | 1.45 | 0.43 |
| factor(extent)Serosa | 0.95 | -0.04 | 1.94 | 0.06 |
| factor(extent)Contiguous Structures | 1.33 | 0.26 | 2.40 | 0.01 |
| factor(surg)Short | -0.24 | -0.45 | -0.04 | 0.02 |
From the above table, it is seen that spread of cancer to contiguous structures had the highest HR (though the value is not statistically significant). Among the treatment groups, Leviamisole + 5-FU group had the lowest HR which is statistically significant. Other covariates with low HRs (that are statistically significant) are other sexes than male, nodes, and shorter time for registration into the study.
Hypotheses:
H0: FEV1 is not different within the treatment groups
H1: FEV1 is different within the treatment groups
Age
| Term | Hazard Ratio | Lower Bound | Upper Bound | P-value |
|---|---|---|---|---|
| factor(rx)Levamisole | -0.01 | -0.22 | 0.20 | 0.90 |
| factor(rx)Levamisole+5-FU | -0.51 | -0.74 | -0.28 | 0.00 |
| age | -0.01 | -0.01 | 0.00 | 0.11 |
Sex
| Term | Hazard Ratio | Lower Bound | Upper Bound | P-value |
|---|---|---|---|---|
| factor(rx)Levamisole | -0.01 | -0.22 | 0.20 | 0.92 |
| factor(rx)Levamisole+5-FU | -0.52 | -0.75 | -0.28 | 0.00 |
| sexOther | 0.11 | -0.08 | 0.29 | 0.25 |
According to the values in the above tables, there are no significant changes in values of the estimates and 95% CIs. Therefore, confounding is not seen for age and sex.
Interaction for time from surgery to registration is checked for this model. The purpose of this test is to observe any effect of sex on the final results.
| Term | Hazard Ratio | Lower Bound | Upper Bound | P-value |
|---|---|---|---|---|
| factor(rx)Levamisole | 0.13 | -0.26 | 0.51 | 0.51 |
| factor(rx)Levamisole+5-FU | -0.26 | -0.68 | 0.16 | 0.23 |
| surgShort | -0.08 | -0.40 | 0.24 | 0.62 |
| factor(rx)Levamisole:surgShort | -0.19 | -0.65 | 0.27 | 0.41 |
| factor(rx)Levamisole+5-FU:surgShort | -0.34 | -0.85 | 0.16 | 0.18 |
There is no statistically significant interaction between the treatment groups and time from surgery to registration.