Dataset description

Consider the data set from the Titanic case where details of passengers who boarded the ship is listed along with their, ticket class, fare, survival status etc.

Variables

There are different variables including categorical and numeric both. For the Survival variable ‘0’ indicates ‘didn’t survive’ and ‘1’ indicates ‘did survive’,and We can see the statistics of a variables below;

summary(titan)
##     Survived          Pclass          Sex           Age       
##  Min.   :0.0000   Min.   :1.000   female:312   Min.   : 0.40  
##  1st Qu.:0.0000   1st Qu.:2.000   male  :577   1st Qu.:22.00  
##  Median :0.0000   Median :3.000                Median :29.70  
##  Mean   :0.3825   Mean   :2.312                Mean   :29.65  
##  3rd Qu.:1.0000   3rd Qu.:3.000                3rd Qu.:35.00  
##  Max.   :1.0000   Max.   :3.000                Max.   :80.00  
##      SibSp            Parch             Fare         Embarked
##  Min.   :0.0000   Min.   :0.0000   Min.   :  0.000   C:168   
##  1st Qu.:0.0000   1st Qu.:0.0000   1st Qu.:  7.896   Q: 77   
##  Median :0.0000   Median :0.0000   Median : 14.454   S:644   
##  Mean   :0.5242   Mean   :0.3825   Mean   : 32.097           
##  3rd Qu.:1.0000   3rd Qu.:0.0000   3rd Qu.: 31.000           
##  Max.   :8.0000   Max.   :6.0000   Max.   :512.329

The case

Now we will analysis the average age of survivors and non survivors of the accident.

avg_age <- aggregate(titan$Age, list(titan$Survived), mean)
avg_age
##   Group.1        x
## 1       0 30.41530
## 2       1 28.42382

From the above table of average age of survivors and non-survivors, it is evident that the average age of non-survivors is more than the average age of passengers who survived the accident.

Hypothesis

H2:The Titanic survivors were younger than the passengers who died

t-test

A two-group independent t-test can be used to test above hypothesis.Here, we assume that the two groups i.e. survived and age are independent and that the data is sampled from normal populations. The following code is used to carry out t-test for the said hypothesis.

t.test(titan$Age~titan$Survived)
## 
##  Welch Two Sample t-test
## 
## data:  titan$Age by titan$Survived
## t = 2.1816, df = 667.56, p-value = 0.02949
## alternative hypothesis: true difference in means is not equal to 0
## 95 percent confidence interval:
##  0.1990628 3.7838912
## sample estimates:
## mean in group 0 mean in group 1 
##        30.41530        28.42382

Result of t-test

The above results show that p-value is 0.03 (p>0.001), hence we reject the null hypothesis of the test.Based on the result we can say that there is a significant difference between the average average age of non-survivors and survivors.

In view of this H2 is true, i.e. The average age of the survivors were younger than the passengers who died.