Importing the file and creating the dataframe:
setwd("C:/Users/lenovo/Desktop/se")
titanic.df=read.csv("Titanic Data.csv")
View(titanic.df)
Average age of the survivors and the average age of the people who died.
aggregate(titanic.df$Age,by=list(Survived=titanic.df$Survived), mean)
## Survived x
## 1 0 30.41530
## 2 1 28.42382
T-Test to determine that the Titanic survivors were younger than the passengers who died:
lived=titanic.df[which(titanic.df$Survived=="1"),]
lived_age=lived$Age
died=titanic.df[which(titanic.df$Survived=="0"),]
died_age=died$Age
t.test(lived_age,died_age)
##
## Welch Two Sample t-test
##
## data: lived_age and died_age
## t = -2.1816, df = 667.56, p-value = 0.02949
## alternative hypothesis: true difference in means is not equal to 0
## 95 percent confidence interval:
## -3.7838912 -0.1990628
## sample estimates:
## mean of x mean of y
## 28.42382 30.41530
Conclusion:
p-value = 0.02949 hence the hypothesis that the Titanic survivors were younger than the passengers who died is TRUE.