Colon Cancer Survival using Random Forests

Steven J. Rigatti, MD
AVP, Medical Director

Colon Cancer Survival

Colon cancer is the 3rd leading cause of cancer death in both men and women in the United States.

Being able to ascertain the risk of death for a given individual is vital when making treatment decisions:

  • If the risk of death is very high it may make sense to either pursue aggressive treatment or institute comfort measures only
  • If the risk is very low then treatment can be tailored to minimize toxicity
  • As a doctor, I have often wanted to have access to cancer survival data in order to better inform my patients and my treatment recommendations. I have developed a Shiny app to do exactly that.

What it Does

  • This app predicts conditional survival to the end of the 5th year (60th month) for patients with a recent diagnosis of colorectal cancer. Conditional survival is the probability of surviving to a given end point, conditioned on the subject surviving for some amount of time already.

    Input values include:

  • Demographics (Age and Sex)

  • T-stage (Tumor Stage) T1, T2, T3 or T4 (deeply the tumor has invaded the colon wall)

  • N-stage (Nodal Stage) N0, N1 or N2 (reflecting the number of nearby lymph nodes involved)

  • M-stage (Metastasis Stage), M0 or M1 (the presence or absence of metastases)

  • Conditonal survival time - the number of months the individual has already survived since diagnosis

Data

  • The survival data comes from the SEER (Survival, epidemiology and end results) project, part of the National Cancer Institute
  • This is a population cancer registry covering approximately 60% of the United States
  • Search terms specified only adenocarcinoma of the colon or rectum in the SEER 18 data, for patients aged 50 to 69 and diagnosed between 2004 and 2007 plot of chunk unnamed-chunk-1

Methods and Notes

  • The package 'randomForestSRC' was used to construct a random forest survival model
  • This method constructs a separate forest for each unique survival time (1-60 months in this case)
  • Prediction can then be done on the input variables from the widgets
  • The 'plot.predict' function can then be used to display a survival curve
  • Conditonal survival is calculated as the cumulative survival at 60 months divided by the cumulative survival at the starting month

  • IMPORTANT NOTE: The random forest object is large and the figure in the app takes a few seconds to refresh.

  • MORE IMPORTANT NOTE: This app has not been validated and should not be used for any real world purpose

  • EXCITING NOTE: This proof of concept app could be expanded to include all forms of cancer could become a useful tool if validated by peer review