T-Test Analysis based on the “Sinking of the RMS Titanic”

This is a report on the t-Test analysis on the age of the tittanic survivors and of those who died.

setwd("D:/desktop/Data Analytics internship-sameer mathur/work/datasets")
titanic.df<-read.csv(paste("Titanic Data.csv", sep=""))
View(titanic.df)

4b: table showing the average age of the survivors and the average age of the people who died

survived.mean <- aggregate(titanic.df$Age, list(titanic.df$Survived), mean)
colnames(survived.mean)[2] <- "Age"
colnames(survived.mean)[1] <- "Survived or Not"
survived.mean
##   Survived or Not      Age
## 1               0 30.41530
## 2               1 28.42382

The average age of survivors is 30 and the average age of the people who died is 28.

4c: t-Test:

H2: The Titanic survivors were younger than the passengers who died.

  1. Inspecting the structure of the data
head(titanic.df)
##   Survived Pclass    Sex  Age SibSp Parch    Fare Embarked
## 1        0      3   male 22.0     1     0  7.2500        S
## 2        1      1 female 38.0     1     0 71.2833        C
## 3        1      3 female 26.0     0     0  7.9250        S
## 4        1      1 female 35.0     1     0 53.1000        S
## 5        0      3   male 35.0     0     0  8.0500        S
## 6        0      3   male 29.7     0     0  8.4583        Q
  1. summary statistics
table(titanic.df$Survived)
## 
##   0   1 
## 549 340
summary(titanic.df$Age)
##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
##    0.40   22.00   29.70   29.65   35.00   80.00
  1. Average age of survivors and people who died
survived.mean
##   Survived or Not      Age
## 1               0 30.41530
## 2               1 28.42382
  1. Graphical represenation
boxplot(Age ~ Survived, data=titanic.df, yaxt="n", horizontal = TRUE, ylab="Survived or Not", xlab="Age", main="Comparison of Age of survivors and people who died", col=c("blue", "turquoise"))
axis(side=2, at=c(1,2), labels=c("Died", "Survived"))

log.transformed.Age = log(titanic.df$Age)

Hypothesis 2: The Titanic survivors were younger than the passengers who died.

NULL hypothesis: -" There is no significant difference between age of titanic survivors and those who died."

t.test(log.transformed.Age~titanic.df$Survived,var.equal = TRUE)
## 
##  Two Sample t-test
## 
## data:  log.transformed.Age by titanic.df$Survived
## t = 3.844, df = 887, p-value = 0.0001297
## alternative hypothesis: true difference in means is not equal to 0
## 95 percent confidence interval:
##  0.09102778 0.28094770
## sample estimates:
## mean in group 0 mean in group 1 
##        3.304318        3.118330