Survival rate of cancer patients with sodium ascorbate treatment.

Tian, James

Section: 3

Introduction:

Sodium ascorbate has been shown to selectively induce apoptosis in cancer cells. (http://www.ncbi.nlm.nih.gov/pubmed/15672419 and http://www.ncbi.nlm.nih.gov/pubmed/16157892). A study of 100 patients with cancers of the stomach, bronchus, colon, ovary or breast were treated with ascorbate and their resulting days of survival were measured. We are interested if the treatment with ascorbate has different mean survival times varying for each type of cancer.

Analysis:

The data set contains the days of survival of 64 cancer patients after being treated with ascorbate. To begin, since the number of data points for each cancer is relatively small, we need to remove outliers that will skew our mean. The first analysis is a five number summary for the mean days of survival for each cancer type, and to remove the outliers based on 1.5*IQR.

Then I will conduct an ANOVA test to see if the mean days of survival for each cancer differs between the type of cancer.

Then I will estimate the true population mean of days of survival by taking the overall sample mean. Then I will conduct multiple t-tests to see if the survival days for each type of cancer differ from the true population mean.

Each observation collected is independent from each other. The patients enrolled in this study can be counted as a random sampling. The distribution of days of survival isn't normally distributed, but since it is unimodal, nearly normal, and our sample size is 64, we can proceed with caution. We cannot completely guarantee this, but since these are all cancer treatments, we can assume that the residuals are normally distributed with roughly equal variances.

Exploratory Data Analysis

summary(data)

##     Survival         Organ   
##  Min.   :  20   Breast  :11  
##  1st Qu.: 102   Bronchus:17  
##  Median : 266   Colon   :17  
##  Mean   : 559   Ovary   : 6  
##  3rd Qu.: 721   Stomach :13  
##  Max.   :3808

boxplot(data$Survival ~ data$Organ)

plot of chunk unnamed-chunk-2

dataStomach = subset(data, data$Organ == "Stomach")
dataBronchus = subset(data, data$Organ == "Bronchus")
dataColon = subset(data, data$Organ == "Colon")
dataOvary = subset(data, data$Organ == "Ovary")
dataBreast = subset(data, data$Organ == "Breast")

summary(dataStomach)

##     Survival         Organ   
##  Min.   :  25   Breast  : 0  
##  1st Qu.:  46   Bronchus: 0  
##  Median : 124   Colon   : 0  
##  Mean   : 286   Ovary   : 0  
##  3rd Qu.: 396   Stomach :13  
##  Max.   :1112

summary(dataBronchus)

##     Survival        Organ   
##  Min.   : 20   Breast  : 0  
##  1st Qu.: 72   Bronchus:17  
##  Median :155   Colon   : 0  
##  Mean   :212   Ovary   : 0  
##  3rd Qu.:245   Stomach : 0  
##  Max.   :859

summary(dataColon)

##     Survival         Organ   
##  Min.   :  20   Breast  : 0  
##  1st Qu.: 189   Bronchus: 0  
##  Median : 372   Colon   :17  
##  Mean   : 457   Ovary   : 0  
##  3rd Qu.: 519   Stomach : 0  
##  Max.   :1843

summary(dataOvary)

##     Survival         Organ  
##  Min.   :  89   Breast  :0  
##  1st Qu.: 240   Bronchus:0  
##  Median : 406   Colon   :0  
##  Mean   : 884   Ovary   :6  
##  3rd Qu.:1040   Stomach :0  
##  Max.   :2970

summary(dataBreast)

##     Survival         Organ   
##  Min.   :  24   Breast  :11  
##  1st Qu.: 723   Bronchus: 0  
##  Median :1166   Colon   : 0  
##  Mean   :1396   Ovary   : 0  
##  3rd Qu.:1692   Stomach : 0  
##  Max.   :3808

Stomach upper range = 921 Bronchus upper range = 505 Colon upper range = 1014 Ovary upper range = 2240 Breast IQR = 3146

dataStomachNew = subset(dataStomach, dataStomach$Survival < 921)
dataBronchusNew = subset(dataBronchus, dataBronchus$Survival < 505)
dataColonNew = subset(dataColon, dataColon$Survival < 1014)
dataOvaryNew = subset(dataOvary, dataOvary$Survival < 2240)
dataBreastNew = subset(dataBreast, dataBreast$Survival < 3146)
dataNew <- rbind(dataStomachNew, dataBronchusNew, dataColonNew, dataOvaryNew, 
    dataBreastNew)
boxplot(dataNew$Survival ~ dataNew$Organ)

plot of chunk unnamed-chunk-4

summary(dataNew)

##     Survival         Organ   
##  Min.   :  20   Breast  : 9  
##  1st Qu.:  92   Bronchus:16  
##  Median : 234   Colon   :16  
##  Mean   : 374   Ovary   : 5  
##  3rd Qu.: 456   Stomach :12  
##  Max.   :1804

Subsection 2

Now we have the data sets with the outliers removed, we now conduct an ANOVA test. We've checked our conditions already in the analysis.

inference(data = dataNew$Survival, group = dataNew$Organ, est = "mean", type = "ht", 
    alternative = "greater", method = "theoretical")

## Response variable: numerical, Explanatory variable: categorical
## ANOVA
## 
## Summary statistics:
## n_Breast = 9, mean_Breast = 898.6, sd_Breast = 617
## n_Bronchus = 16, mean_Bronchus = 171.1, sd_Bronchus = 131.5
## n_Colon = 16, mean_Colon = 370.8, sd_Colon = 242.2
## n_Ovary = 5, mean_Ovary = 467.2, sd_Ovary = 451.2
## n_Stomach = 12, mean_Stomach = 217.2, sd_Stomach = 252.3

## H_0: All means are equal.
## H_A: At least one mean is different.
## Analysis of Variance Table
## 
## Response: data
##           Df  Sum Sq Mean Sq F value  Pr(>F)
## group      4 3473712  868428    8.08 3.7e-05
## Residuals 53 5698923  107527                
## 
## Pairwise tests: t tests with pooled SD 
##          Breast Bronchus  Colon  Ovary
## Bronchus 0.0000       NA     NA     NA
## Colon    0.0003   0.0908     NA     NA
## Ovary    0.0221   0.0838 0.5686     NA
## Stomach  0.0000   0.7146 0.2253 0.1579

plot of chunk unnamed-chunk-5

Subsection 3

We estimate that the true population days of survival after treatment is the sample mean from the outlier removed data, which is 374 days. The ANOVA test included paired t-tests between two types of cancer. This is a t-test between each type of cancer and the estimated population mean.

inference(data = dataStomachNew$Survival, est = "mean", type = "ht", null = 374, 
    method = "theoretical", alternative = "twosided")

## Single mean 
## Summary statistics:

## mean = 217.1667 ;  sd = 252.2682 ;  n = 12 
## H0: mu = 374 
## HA: mu != 374 
## Standard error = 72.82 
## Test statistic: T = -2.154 
## Degrees of freedom:  11 
## p-value =  0.0544

plot of chunk unnamed-chunk-6

inference(data = dataBronchusNew$Survival, est = "mean", type = "ht", null = 374, 
    method = "theoretical", alternative = "twosided")

## Single mean 
## Summary statistics:

## mean = 171.125 ;  sd = 131.4817 ;  n = 16 
## H0: mu = 374 
## HA: mu != 374 
## Standard error = 32.87 
## Test statistic: T = -6.172 
## Degrees of freedom:  15 
## p-value =  0

plot of chunk unnamed-chunk-6

inference(data = dataColonNew$Survival, est = "mean", type = "ht", null = 374, 
    method = "theoretical", alternative = "twosided")

## Single mean 
## Summary statistics:

## mean = 370.8125 ;  sd = 242.1738 ;  n = 16 
## H0: mu = 374 
## HA: mu != 374 
## Standard error = 60.54 
## Test statistic: T = -0.053 
## Degrees of freedom:  15 
## p-value =  0.9588

plot of chunk unnamed-chunk-6

inference(data = dataOvaryNew$Survival, est = "mean", type = "ht", null = 374, 
    method = "theoretical", alternative = "twosided")

## Single mean 
## Summary statistics:

## mean = 467.2 ;  sd = 451.2125 ;  n = 5 
## H0: mu = 374 
## HA: mu != 374 
## Standard error = 201.8 
## Test statistic: T = 0.462 
## Degrees of freedom:  4 
## p-value =  0.6682

plot of chunk unnamed-chunk-6

inference(data = dataBreastNew$Survival, est = "mean", type = "ht", null = 374, 
    method = "theoretical", alternative = "twosided")

## Single mean 
## Summary statistics:

## mean = 898.5556 ;  sd = 616.9974 ;  n = 9 
## H0: mu = 374 
## HA: mu != 374 
## Standard error = 205.7 
## Test statistic: T = 2.551 
## Degrees of freedom:  8 
## p-value =  0.0342

plot of chunk unnamed-chunk-6

Discussion:

We studied a sample of 64 observations of days of survival after ascorbate treatment for stomach, bronchus, colon, ovary, and breast cancer. After eliminating outliers for each type of cancer, we were left with 58 observations. The conditions for an ANOVA test and t-test were met. However precaution was taken because the residuals for each cancer may not have equal variances and the ovary data set only had 5 observations after outliers were removed. All other assumptions were met so we proceeded with caution.

We conducted an ANOVA test and concluded that at least one of the mean days of survival between the cancers was different from the other, with an F-statistic of 8.08 and p-value of 3.7x10^-5. We then estimated the true population mean of days of survival after treatment was the sample population mean (without outliers), which was 374 days. We conducted a t-test for each cancer against this population mean, taking into account the low sample size for some of the cancers. We found that the mean survival days for bronchus and breast cancer was statistically significantly different from the mean, with a p-value of near 0 and 0.0342 respectively.

Our main limitation was our small sample size. We are unable to draw meaningful conclusions about ovary cancer, since only 6 observations total were collected. We also didn't know the true population mean of days of survival, which we just estimate from out sample. Further research can help estimate the true population mean more accurately, as well as expanding the number of observations for each type of cancer, especially ovary. It would also be helpful to have follow up information done on each of the patients so we can remove observations in which the cancer was successfuly treated and the patient died of another cause. These observations will skew the days of survival towards the high end.