Did age play a role in survival chance?
tit_df<-read.csv("Titanic Data.csv")
attach(tit_df)
tit_df$Survived<-factor(tit_df$Survived, levels = c(0,1),labels = c("No","Yes"))
mean_age_sur<-aggregate(Age, by = list(Survived), mean)
colnames(mean_age_sur)<-c("Survival","Mean age")
mean_age_sur
## Survival Mean age
## 1 0 30.41530
## 2 1 28.42382
The table shows that the mean age of survivors is only close to 2 years less than that of the ones who died.
Boxplot and T-test
boxplot(Age~Survived, main="Mean Age of survivors and victims", xlab="Survival",ylab="Age",col = c("red","green"))

t.test(Age~Survived,mean.equal = TRUE)
##
## Welch Two Sample t-test
##
## data: Age by Survived
## t = 2.1816, df = 667.56, p-value = 0.02949
## alternative hypothesis: true difference in means is not equal to 0
## 95 percent confidence interval:
## 0.1990628 3.7838912
## sample estimates:
## mean in group 0 mean in group 1
## 30.41530 28.42382
Interpretation:-
1. The average age of survivors is 28.42 years and that of victims is 30.41.
2. The boxplot shows that both the survivors and the victims had almost the same median age of around 30 years, however the victims have outliers who are older than 50 years.
3. The p value from t test is approximately 0.03, which is not too low but still less than 0.05 and hence if we reject the NULL hypothesis that the average age of survivors is NOT significantly lesser than than the victims then the chance of our assumption being incorrect is higher.
4. When we examine the boxplot, the number of outliers who are older than 50 who did not survive are quite a lot in number, which solidify the H2 statement that the survivors were indeed younger than the victims.