Consider the data set from the Titanic case where details of passengers who boarded the ship is listed along with their, ticket class, fare, survival status etc.
There are different variables including categorical and numeric both. For the Survival variable ‘0’ indicates ‘didn’t survive’ and ‘1’ indicates ‘did survive’,and We can see the statistics of a variables below;
summary(titan)
## Survived Pclass Sex Age
## Min. :0.0000 Min. :1.000 female:312 Min. : 0.40
## 1st Qu.:0.0000 1st Qu.:2.000 male :577 1st Qu.:22.00
## Median :0.0000 Median :3.000 Median :29.70
## Mean :0.3825 Mean :2.312 Mean :29.65
## 3rd Qu.:1.0000 3rd Qu.:3.000 3rd Qu.:35.00
## Max. :1.0000 Max. :3.000 Max. :80.00
## SibSp Parch Fare Embarked
## Min. :0.0000 Min. :0.0000 Min. : 0.000 C:168
## 1st Qu.:0.0000 1st Qu.:0.0000 1st Qu.: 7.896 Q: 77
## Median :0.0000 Median :0.0000 Median : 14.454 S:644
## Mean :0.5242 Mean :0.3825 Mean : 32.097
## 3rd Qu.:1.0000 3rd Qu.:0.0000 3rd Qu.: 31.000
## Max. :8.0000 Max. :6.0000 Max. :512.329
Now we will analysis the average age of survivors and non survivors of the accident.
avg_age <- aggregate(titan$Age, list(titan$Survived), mean)
avg_age
## Group.1 x
## 1 0 30.41530
## 2 1 28.42382
From the above table of average age of survivors and non-survivors, it is evident that the average age of non-survivors is more than the average age of passengers who survived the accident.
H2:The Titanic survivors were younger than the passengers who died
A two-group independent t-test can be used to test above hypothesis.Here, we assume that the two groups i.e. survived and age are independent and that the data is sampled from normal populations. The following code is used to carry out t-test for the said hypothesis.
t.test(titan$Age~titan$Survived)
##
## Welch Two Sample t-test
##
## data: titan$Age by titan$Survived
## t = 2.1816, df = 667.56, p-value = 0.02949
## alternative hypothesis: true difference in means is not equal to 0
## 95 percent confidence interval:
## 0.1990628 3.7838912
## sample estimates:
## mean in group 0 mean in group 1
## 30.41530 28.42382
The above results show that p-value is 0.03 (p>0.001), hence we reject the null hypothesis of the test.Based on the result we can say that there is a significant difference between the average average age of non-survivors and survivors.
In view of this H2 is true, i.e. The average age of the survivors were younger than the passengers who died.