Average age of the survivors is 30.415 qnd the average age of those who died is 28.424.
tsa.df <- read.csv(paste("Titanic Data.csv", sep=""))
aggregate(tsa.df$Age, by=list(tsa.df$Survived), mean)
## Group.1 x
## 1 0 30.41530
## 2 1 28.42382
There are many outliers above the maximum in the list of people who died compared to the survivors.
boxplot(tsa.df$Age~tsa.df$Survived)
The p-value indicates there is very few probability of showing an inequality when equality exists in a sample. Hence, NULL hypothesis can be rejected.
##
## Paired t-test
##
## data: tsa.df$Age and tsa.df$Survived
## t = 67.065, df = 888, p-value < 2.2e-16
## alternative hypothesis: true difference in means is not equal to 0
## 95 percent confidence interval:
## 28.41458 30.12782
## sample estimates:
## mean of the differences
## 29.2712
Note that the echo = FALSE parameter was added to the code chunk to prevent printing of the R code that generated the plot.