The aim of this study is to run statistics to determine whether the organ affected by cancer (independent variable) has an effect on survival rate (dependent variable).
#Load in excel data using Utils package
library(utils)
setwd("C:/Users/Alexis/Documents/Alexis/RPI/DoE")
cancer_data <- read.csv("cancer-survival.csv", header = TRUE)
#Show first and last ten rows of data table
head(cancer_data, 5)
## Survival Organ
## 1 124 Stomach
## 2 42 Stomach
## 3 25 Stomach
## 4 45 Stomach
## 5 412 Stomach
tail(cancer_data, 5)
## Survival Organ
## 60 3808 Breast
## 61 791 Breast
## 62 1804 Breast
## 63 3460 Breast
## 64 719 Breast
levels(cancer_data$Organ)
## [1] "Breast" "Bronchus" "Colon" "Ovary" "Stomach"
sample_size = length(1:nrow(cancer_data))
sample_size
## [1] 64
As I did not collect the data, I cannot say whether subjects were randomly chosen. Other types of randomization do not seem to apply to this case as the “treatment” is not assigned to patients. As the survival times for different patients was studied, there were indeed replicates for this experiment. To avoid the confounds of many different cancers, only 5 types of cancers were studied. However, it does not appear that blocking was used against potential nuisance factors including therapeutics used, therapeutic duration, stage of cancer when diagnosed, etc.
#Exploratory boxplot examining levels of each factor for exploratory analysis
boxplotcand <- boxplot(cancer_data$Survival~cancer_data$Organ, xlab = "Organ", ylab = "Survival time (days)", main = "Survival time as a function of organ")
The main effect was conducted for Organ.
To determine the main effect of organ which has five levels, I calculate the max, min, mean, and median for all levels. Max and min were not used as there are outliers in the data sets. Mean was not used as it was likely skewed by these outliers. Thus, I used median to calculate the main effect for organ using the greatest median (breast tissue) and lowest median (stomach tissue).
main_effect_survival = median_breast - median_stomach
main_effect_survival
## [1] 1042
#Compute Analysis of Variance for the organ main effect
anova_organ <- aov(cancer_data$Survival ~ cancer_data$Organ)
summary.aov(anova_organ)
## Df Sum Sq Mean Sq F value Pr(>F)
## cancer_data$Organ 4 11535761 2883940 6.433 0.000229 ***
## Residuals 59 26448144 448274
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
In this one-factor, 5 level experiment, the effect of organ on survival time was studied. An ANOVA was used to determine whether the difference seen between the levels was significant. Ultimately, the ANOVA suggests that the variance was more than what could occur from randomization.
This experiment uses a fixed effect model, which has a fixed number of levels and can only suggest the effect of the factor rather than make inferences about an entire population. In this experiment, the levels of organ were fixed to include only breast, colon, lung, ovary, and stomach.