Introduction

In this markdown file, we continue exploring the Titanic Data.csv file and use it to gain deeper understanding about the ages of the survivors and deceased.

We are initially writing down the script to read the required csv file and attach () it so that we can start using it.

setwd("~/Muyeena/Internship/Case Studies/Titanic")
titanic = read.csv("Titanic Data.csv")
#View(titanic)
attach(titanic)
str(titanic)
## 'data.frame':    889 obs. of  8 variables:
##  $ Survived: int  0 1 1 1 0 0 0 0 1 1 ...
##  $ Pclass  : int  3 1 3 1 3 3 1 3 3 2 ...
##  $ Sex     : Factor w/ 2 levels "female","male": 2 1 1 1 2 2 2 2 1 1 ...
##  $ Age     : num  22 38 26 35 35 29.7 54 2 27 14 ...
##  $ SibSp   : int  1 1 0 1 0 0 0 3 0 1 ...
##  $ Parch   : int  0 0 0 0 0 0 0 1 2 0 ...
##  $ Fare    : num  7.25 71.28 7.92 53.1 8.05 ...
##  $ Embarked: Factor w/ 3 levels "C","Q","S": 3 1 3 3 3 2 3 3 3 1 ...

Average age of the survivors and deceased

We use the aggregate() function to get a table which contains the average age of the survivors and the average age of the people who died.

mean.age = aggregate(Age ~ Survived, data = titanic, mean)
mean.age
##   Survived      Age
## 1        0 30.41530
## 2        1 28.42382

The following code is used to visualize the ages of the survivors and those of the deceased.

boxplot(Age ~ Survived, data = titanic,
        horizontal = TRUE,
        yaxt = "n",
        ylab = "Survival Status",
        xlab = "Age",
        col = c("red","blue"),
        main = "Comparison of ages of survivors and victims")
axis(side = 2, at=c(1,2), labels = c("Deceased","Survivors"))


Use R to run a t-test to test the given hypothesis

The given hypothesis is -

The titanic survivors were younger than the passengers who died.

Therefore, the null hypothesis will be -

There is no significance difference betweent the ages of the survivors and that of the deceased

To test the hypothesis, we run a code called t.test() in R. This test will conduct the Welch Two Sample t-test.

p = t.test(Age ~ Survived) # We can just use the variable name in this, because we have attached the dataset initially. Thus, attaching dataset leads to easy access. 
p
## 
##  Welch Two Sample t-test
## 
## data:  Age by Survived
## t = 2.1816, df = 667.56, p-value = 0.02949
## alternative hypothesis: true difference in means is not equal to 0
## 95 percent confidence interval:
##  0.1990628 3.7838912
## sample estimates:
## mean in group 0 mean in group 1 
##        30.41530        28.42382

Interpret the t-test results

In the above t-test, the p-value obtained is 0.0294879. This value is less than 0.05 but it is greater than 0.01.

Therefore, we can say that as the given p-value is greater than 0.01, we cannot reject the null hypothesis.

Hence, There is no significant differences in the ages of the survivors and in the ages of the deceased.