->reading the file
titanic.df <- read.csv(paste("Titanic Data.csv", sep=""))
->viewing the dataframe
View(titanic.df)
->the column survived indicates two variables (0 and 1) implying 0= not suvived , 1= survived
-> Labeling
titanic.df$Survived = factor(titanic.df$Survived , levels = c(0,1) ,
labels = c("Died" , "Survived"))
head(titanic.df)
## Survived Pclass Sex Age SibSp Parch Fare Embarked
## 1 Died 3 male 22.0 1 0 7.2500 S
## 2 Survived 1 female 38.0 1 0 71.2833 C
## 3 Survived 3 female 26.0 0 0 7.9250 S
## 4 Survived 1 female 35.0 1 0 53.1000 S
## 5 Died 3 male 35.0 0 0 8.0500 S
## 6 Died 3 male 29.7 0 0 8.4583 Q
4b) Creating table ( Avgage ) showing the average age of survivors and average age of people who died
titanic.age.avg <- aggregate(titanic.df$Age , list(titanic.df$Survived) , mean)
titanic.age.avg
## Group.1 x
## 1 Died 30.41530
## 2 Survived 28.42382
View(titanic.age.avg)
titanic.df$Avgage <- titanic.age.avg [titanic.df$Survived , 2]
View(titanic.df)
->A dependent t-test assumes that the difference between groups is normally distributed. So, both the groups should be numeric vectors -> Changing the format of the column ( factor to num) -> Labeling : 2-Survived and 1-Died
str(titanic.df$Survived)
## Factor w/ 2 levels "Died","Survived": 1 2 2 2 1 1 1 1 2 2 ...
titanic.df$Survived <- as.numeric(titanic.df$Survived)
str(titanic.df$Survived)
## num [1:889] 1 2 2 2 1 1 1 1 2 2 ...
4c) Running t-test
t.test(titanic.df$Survived , titanic.df$Avgage , var.equal = TRUE , paired = FALSE)
##
## Two Sample t-test
##
## data: titanic.df$Survived and titanic.df$Avgage
## t = -777.9, df = 1776, p-value < 2.2e-16
## alternative hypothesis: true difference in means is not equal to 0
## 95 percent confidence interval:
## -28.34248 -28.19992
## sample estimates:
## mean of x mean of y
## 1.382452 29.653656
->Interpreting results of t-test
p-value is less than 0.05 .
The null hypothesis is rejected ( alternative hypothesis : true difference in means is not 0 )
*There is a significant difference between the average age of survivors and the average age of people who died.
->Average age comparison , 1- Died and 2- Survived
titanic.age.avg <- aggregate(titanic.df$Age , list(titanic.df$Survived) , mean)
titanic.age.avg
## Group.1 x
## 1 1 30.41530
## 2 2 28.42382
*The Hypothesis “The Titanic survivors were younger than the passengers who died.” is true.