Setting directory and viewing the dataframe

setwd("C:\\Users\\harsh\\Desktop\\r")
titanic.df <- read.csv("Titanic Data.csv", TRUE, ",")
View(titanic.df)

Labelling and viewing the mean age of passengers (survived and dead)

titanic.df$Survived= factor(titanic.df$Survived, levels= c(0,1), labels= c("Died", "Survived"))
head(titanic.df)

##   Survived Pclass    Sex  Age SibSp Parch    Fare Embarked
## 1     Died      3   male 22.0     1     0  7.2500        S
## 2 Survived      1 female 38.0     1     0 71.2833        C
## 3 Survived      3 female 26.0     0     0  7.9250        S
## 4 Survived      1 female 35.0     1     0 53.1000        S
## 5     Died      3   male 35.0     0     0  8.0500        S
## 6     Died      3   male 29.7     0     0  8.4583        Q

by(titanic.df$Age, titanic.df$Survived, mean)

## titanic.df$Survived: Died
## [1] 30.4153
## -------------------------------------------------------- 
## titanic.df$Survived: Survived
## [1] 28.42382

Box plot to check the variance

boxplot(titanic.df$Age~titanic.df$Survived)

T-TEST

t.test(Age ~ Survived, data=titanic.df)

## 
##  Welch Two Sample t-test
## 
## data:  Age by Survived
## t = 2.1816, df = 667.56, p-value = 0.02949
## alternative hypothesis: true difference in means is not equal to 0
## 95 percent confidence interval:
##  0.1990628 3.7838912
## sample estimates:
##     mean in group Died mean in group Survived 
##               30.41530               28.42382

Since the p-value returned is 0.029, which is less than 0.05, we can conclude that the titanic survivors were younger than the people who died.

Titanic_T.Test

Harsh Sandhu

December 11, 2017

Labelling and viewing the mean age of passengers (survived and dead)

Box plot to check the variance

T-TEST