Sodium ascorbate has been shown to selectively induce apoptosis in cancer cells. (http://www.ncbi.nlm.nih.gov/pubmed/15672419 and http://www.ncbi.nlm.nih.gov/pubmed/16157892). A study of 100 patients with cancers of the stomach, bronchus, colon, ovary or breast were treated with ascorbate and their resulting days of survival were measured. We are interested if the treatment with ascorbate has different mean survival times varying for each type of cancer.
The data set contains the days of survival of 64 cancer patients after being treated with ascorbate. To begin, since the number of data points for each cancer is relatively small, we need to remove outliers that will skew our mean. The first analysis is a five number summary for the mean days of survival for each cancer type, and to remove the outliers based on 1.5*IQR.
Then I will conduct an ANOVA test to see if the mean days of survival for each cancer differs between the type of cancer.
Then I will estimate the true population mean of days of survival by taking the overall sample mean. Then I will conduct multiple t-tests to see if the survival days for each type of cancer differ from the true population mean.
Each observation collected is independent from each other. The patients enrolled in this study can be counted as a random sampling. The distribution of days of survival isn't normally distributed, but since it is unimodal, nearly normal, and our sample size is 64, we can proceed with caution. We cannot completely guarantee this, but since these are all cancer treatments, we can assume that the residuals are normally distributed with roughly equal variances.
summary(data)
## Survival Organ
## Min. : 20 Breast :11
## 1st Qu.: 102 Bronchus:17
## Median : 266 Colon :17
## Mean : 559 Ovary : 6
## 3rd Qu.: 721 Stomach :13
## Max. :3808
boxplot(data$Survival ~ data$Organ)
dataStomach = subset(data, data$Organ == "Stomach")
dataBronchus = subset(data, data$Organ == "Bronchus")
dataColon = subset(data, data$Organ == "Colon")
dataOvary = subset(data, data$Organ == "Ovary")
dataBreast = subset(data, data$Organ == "Breast")
summary(dataStomach)
## Survival Organ
## Min. : 25 Breast : 0
## 1st Qu.: 46 Bronchus: 0
## Median : 124 Colon : 0
## Mean : 286 Ovary : 0
## 3rd Qu.: 396 Stomach :13
## Max. :1112
summary(dataBronchus)
## Survival Organ
## Min. : 20 Breast : 0
## 1st Qu.: 72 Bronchus:17
## Median :155 Colon : 0
## Mean :212 Ovary : 0
## 3rd Qu.:245 Stomach : 0
## Max. :859
summary(dataColon)
## Survival Organ
## Min. : 20 Breast : 0
## 1st Qu.: 189 Bronchus: 0
## Median : 372 Colon :17
## Mean : 457 Ovary : 0
## 3rd Qu.: 519 Stomach : 0
## Max. :1843
summary(dataOvary)
## Survival Organ
## Min. : 89 Breast :0
## 1st Qu.: 240 Bronchus:0
## Median : 406 Colon :0
## Mean : 884 Ovary :6
## 3rd Qu.:1040 Stomach :0
## Max. :2970
summary(dataBreast)
## Survival Organ
## Min. : 24 Breast :11
## 1st Qu.: 723 Bronchus: 0
## Median :1166 Colon : 0
## Mean :1396 Ovary : 0
## 3rd Qu.:1692 Stomach : 0
## Max. :3808
Stomach upper range = 921 Bronchus upper range = 505 Colon upper range = 1014 Ovary upper range = 2240 Breast IQR = 3146
dataStomachNew = subset(dataStomach, dataStomach$Survival < 921)
dataBronchusNew = subset(dataBronchus, dataBronchus$Survival < 505)
dataColonNew = subset(dataColon, dataColon$Survival < 1014)
dataOvaryNew = subset(dataOvary, dataOvary$Survival < 2240)
dataBreastNew = subset(dataBreast, dataBreast$Survival < 3146)
dataNew <- rbind(dataStomachNew, dataBronchusNew, dataColonNew, dataOvaryNew,
dataBreastNew)
boxplot(dataNew$Survival ~ dataNew$Organ)
summary(dataNew)
## Survival Organ
## Min. : 20 Breast : 9
## 1st Qu.: 92 Bronchus:16
## Median : 234 Colon :16
## Mean : 374 Ovary : 5
## 3rd Qu.: 456 Stomach :12
## Max. :1804
Now we have the data sets with the outliers removed, we now conduct an ANOVA test. We've checked our conditions already in the analysis.
inference(data = dataNew$Survival, group = dataNew$Organ, est = "mean", type = "ht",
alternative = "greater", method = "theoretical")
## Response variable: numerical, Explanatory variable: categorical
## ANOVA
##
## Summary statistics:
## n_Breast = 9, mean_Breast = 898.6, sd_Breast = 617
## n_Bronchus = 16, mean_Bronchus = 171.1, sd_Bronchus = 131.5
## n_Colon = 16, mean_Colon = 370.8, sd_Colon = 242.2
## n_Ovary = 5, mean_Ovary = 467.2, sd_Ovary = 451.2
## n_Stomach = 12, mean_Stomach = 217.2, sd_Stomach = 252.3
## H_0: All means are equal.
## H_A: At least one mean is different.
## Analysis of Variance Table
##
## Response: data
## Df Sum Sq Mean Sq F value Pr(>F)
## group 4 3473712 868428 8.08 3.7e-05
## Residuals 53 5698923 107527
##
## Pairwise tests: t tests with pooled SD
## Breast Bronchus Colon Ovary
## Bronchus 0.0000 NA NA NA
## Colon 0.0003 0.0908 NA NA
## Ovary 0.0221 0.0838 0.5686 NA
## Stomach 0.0000 0.7146 0.2253 0.1579
We estimate that the true population days of survival after treatment is the sample mean from the outlier removed data, which is 374 days. The ANOVA test included paired t-tests between two types of cancer. This is a t-test between each type of cancer and the estimated population mean.
inference(data = dataStomachNew$Survival, est = "mean", type = "ht", null = 374,
method = "theoretical", alternative = "twosided")
## Single mean
## Summary statistics:
## mean = 217.1667 ; sd = 252.2682 ; n = 12
## H0: mu = 374
## HA: mu != 374
## Standard error = 72.82
## Test statistic: T = -2.154
## Degrees of freedom: 11
## p-value = 0.0544
inference(data = dataBronchusNew$Survival, est = "mean", type = "ht", null = 374,
method = "theoretical", alternative = "twosided")
## Single mean
## Summary statistics:
## mean = 171.125 ; sd = 131.4817 ; n = 16
## H0: mu = 374
## HA: mu != 374
## Standard error = 32.87
## Test statistic: T = -6.172
## Degrees of freedom: 15
## p-value = 0
inference(data = dataColonNew$Survival, est = "mean", type = "ht", null = 374,
method = "theoretical", alternative = "twosided")
## Single mean
## Summary statistics:
## mean = 370.8125 ; sd = 242.1738 ; n = 16
## H0: mu = 374
## HA: mu != 374
## Standard error = 60.54
## Test statistic: T = -0.053
## Degrees of freedom: 15
## p-value = 0.9588
inference(data = dataOvaryNew$Survival, est = "mean", type = "ht", null = 374,
method = "theoretical", alternative = "twosided")
## Single mean
## Summary statistics:
## mean = 467.2 ; sd = 451.2125 ; n = 5
## H0: mu = 374
## HA: mu != 374
## Standard error = 201.8
## Test statistic: T = 0.462
## Degrees of freedom: 4
## p-value = 0.6682
inference(data = dataBreastNew$Survival, est = "mean", type = "ht", null = 374,
method = "theoretical", alternative = "twosided")
## Single mean
## Summary statistics:
## mean = 898.5556 ; sd = 616.9974 ; n = 9
## H0: mu = 374
## HA: mu != 374
## Standard error = 205.7
## Test statistic: T = 2.551
## Degrees of freedom: 8
## p-value = 0.0342
We studied a sample of 64 observations of days of survival after ascorbate treatment for stomach, bronchus, colon, ovary, and breast cancer. After eliminating outliers for each type of cancer, we were left with 58 observations. The conditions for an ANOVA test and t-test were met. However precaution was taken because the residuals for each cancer may not have equal variances and the ovary data set only had 5 observations after outliers were removed. All other assumptions were met so we proceeded with caution.
We conducted an ANOVA test and concluded that at least one of the mean days of survival between the cancers was different from the other, with an F-statistic of 8.08 and p-value of 3.7x10-5. We then estimated the true population mean of days of survival after treatment was the sample population mean (without outliers), which was 374 days. We conducted a t-test for each cancer against this population mean, taking into account the low sample size for some of the cancers. We found that the mean survival days for bronchus and breast cancer was statistically significantly different from the mean, with a p-value of near 0 and 0.0342 respectively.
Our main limitation was our small sample size. We are unable to draw meaningful conclusions about ovary cancer, since only 6 observations total were collected. We also didn't know the true population mean of days of survival, which we just estimate from out sample. Further research can help estimate the true population mean more accurately, as well as expanding the number of observations for each type of cancer, especially ovary. It would also be helpful to have follow up information done on each of the patients so we can remove observations in which the cancer was successfuly treated and the patient died of another cause. These observations will skew the days of survival towards the high end.