Titanic t-test

->reading the file

titanic.df <- read.csv(paste("Titanic Data.csv", sep=""))

->viewing the dataframe

View(titanic.df)

->the column survived indicates two variables (0 and 1) implying 0= not suvived , 1= survived

-> Labeling

titanic.df$Survived = factor(titanic.df$Survived , levels = c(0,1) , 
                             labels = c("Died" , "Survived"))
head(titanic.df)

##   Survived Pclass    Sex  Age SibSp Parch    Fare Embarked
## 1     Died      3   male 22.0     1     0  7.2500        S
## 2 Survived      1 female 38.0     1     0 71.2833        C
## 3 Survived      3 female 26.0     0     0  7.9250        S
## 4 Survived      1 female 35.0     1     0 53.1000        S
## 5     Died      3   male 35.0     0     0  8.0500        S
## 6     Died      3   male 29.7     0     0  8.4583        Q

4b) Creating table ( Avgage ) showing the average age of survivors and average age of people who died

titanic.age.avg <- aggregate(titanic.df$Age , list(titanic.df$Survived) , mean)
titanic.age.avg

##    Group.1        x
## 1     Died 30.41530
## 2 Survived 28.42382

View(titanic.age.avg)
titanic.df$Avgage <- titanic.age.avg [titanic.df$Survived , 2]
View(titanic.df)

->A dependent t-test assumes that the difference between groups is normally distributed. So, both the groups should be numeric vectors -> Changing the format of the column ( factor to num) -> Labeling : 2-Survived and 1-Died

str(titanic.df$Survived)

##  Factor w/ 2 levels "Died","Survived": 1 2 2 2 1 1 1 1 2 2 ...

titanic.df$Survived <- as.numeric(titanic.df$Survived)
str(titanic.df$Survived)

##  num [1:889] 1 2 2 2 1 1 1 1 2 2 ...

4c) Running t-test

t.test(titanic.df$Survived , titanic.df$Avgage , var.equal =  TRUE , paired = FALSE)

## 
##  Two Sample t-test
## 
## data:  titanic.df$Survived and titanic.df$Avgage
## t = -777.9, df = 1776, p-value < 2.2e-16
## alternative hypothesis: true difference in means is not equal to 0
## 95 percent confidence interval:
##  -28.34248 -28.19992
## sample estimates:
## mean of x mean of y 
##  1.382452 29.653656

->Interpreting results of t-test

p-value is less than 0.05 .
The null hypothesis is rejected ( alternative hypothesis : true difference in means is not 0 )

*There is a significant difference between the average age of survivors and the average age of people who died.

->Average age comparison , 1- Died and 2- Survived

titanic.age.avg <- aggregate(titanic.df$Age , list(titanic.df$Survived) , mean)
titanic.age.avg

##   Group.1        x
## 1       1 30.41530
## 2       2 28.42382

*The Hypothesis “The Titanic survivors were younger than the passengers who died.” is true.

Titanic t-test

Pooja Gundu

December 11, 2017