This is a report on the t-Test analysis on the age of the tittanic survivors and of those who died.
setwd("D:/desktop/Data Analytics internship-sameer mathur/work/datasets")
titanic.df<-read.csv(paste("Titanic Data.csv", sep=""))
View(titanic.df)
4b: table showing the average age of the survivors and the average age of the people who died
survived.mean <- aggregate(titanic.df$Age, list(titanic.df$Survived), mean)
colnames(survived.mean)[2] <- "Age"
colnames(survived.mean)[1] <- "Survived or Not"
survived.mean
## Survived or Not Age
## 1 0 30.41530
## 2 1 28.42382
The average age of survivors is 30 and the average age of the people who died is 28.
4c: t-Test:
H2: The Titanic survivors were younger than the passengers who died.
head(titanic.df)
## Survived Pclass Sex Age SibSp Parch Fare Embarked
## 1 0 3 male 22.0 1 0 7.2500 S
## 2 1 1 female 38.0 1 0 71.2833 C
## 3 1 3 female 26.0 0 0 7.9250 S
## 4 1 1 female 35.0 1 0 53.1000 S
## 5 0 3 male 35.0 0 0 8.0500 S
## 6 0 3 male 29.7 0 0 8.4583 Q
table(titanic.df$Survived)
##
## 0 1
## 549 340
summary(titanic.df$Age)
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## 0.40 22.00 29.70 29.65 35.00 80.00
survived.mean
## Survived or Not Age
## 1 0 30.41530
## 2 1 28.42382
boxplot(Age ~ Survived, data=titanic.df, yaxt="n", horizontal = TRUE, ylab="Survived or Not", xlab="Age", main="Comparison of Age of survivors and people who died", col=c("blue", "turquoise"))
axis(side=2, at=c(1,2), labels=c("Died", "Survived"))
log.transformed.Age = log(titanic.df$Age)
Hypothesis 2: The Titanic survivors were younger than the passengers who died.
NULL hypothesis: -" There is no significant difference between age of titanic survivors and those who died."
t.test(log.transformed.Age~titanic.df$Survived,var.equal = TRUE)
##
## Two Sample t-test
##
## data: log.transformed.Age by titanic.df$Survived
## t = 3.844, df = 887, p-value = 0.0001297
## alternative hypothesis: true difference in means is not equal to 0
## 95 percent confidence interval:
## 0.09102778 0.28094770
## sample estimates:
## mean in group 0 mean in group 1
## 3.304318 3.118330