read the titanic dataset
titanic<- read.csv(paste("Titanic Data.csv", sep=""))
head(titanic) # first few rows of the data frame
## Survived Pclass Sex Age SibSp Parch Fare Embarked X X.1 X.2
## 1 0 3 male 22.0 1 0 7.2500 S NA NA NA
## 2 1 1 female 38.0 1 0 71.2833 C NA NA NA
## 3 1 3 female 26.0 0 0 7.9250 S NA NA NA
## 4 1 1 female 35.0 1 0 53.1000 S NA NA NA
## 5 0 3 male 35.0 0 0 8.0500 S NA NA NA
## 6 0 3 male 29.7 0 0 8.4583 Q NA NA NA
view data to confirm it is exactly matching the one we saw in the excel file
View(titanic)
4b - Use R to create a table showing the average age of the survivors and the average age of the people who died.
aggregate(titanic$Age,by=list(Survived=titanic$Survived),mean)
## Survived x
## 1 0 30.41530
## 2 1 28.42382
## 3 334 NA
another method
by(titanic$Age,titanic$Survived,mean)
## titanic$Survived: 0
## [1] 30.4153
## --------------------------------------------------------
## titanic$Survived: 1
## [1] 28.42382
## --------------------------------------------------------
## titanic$Survived: 334
## [1] NA
another method
survivortable<-by(titanic$Age,list(titanic$Survived),mean)
survivortable
## : 0
## [1] 30.4153
## --------------------------------------------------------
## : 1
## [1] 28.42382
## --------------------------------------------------------
## : 334
## [1] NA
ftable(survivortable)
## survivortable 28.4238235294118 30.4153005464481
##
## 1 1
average age of survivors= 28.42382 average age of dead=30.4153
4c-Use R to run a t-test to test the following hypothesis: H2: The Titanic survivors were younger than the passengers who died.
t.test(Age~Survived,data=titanic)
##
## Welch Two Sample t-test
##
## data: Age by Survived
## t = 2.1816, df = 667.56, p-value = 0.02949
## alternative hypothesis: true difference in means is not equal to 0
## 95 percent confidence interval:
## 0.1990628 3.7838912
## sample estimates:
## mean in group 0 mean in group 1
## 30.41530 28.42382
t.test(Age~Survived,data = titanic)$p.value
## [1] 0.02948791
here p value is < 0.05. so we reject the null hypothesis we will accept the alternate hypothesis means difference between average ages of dead and survivors is not zero. average age of dead is different fron average age of survivors.the titanic survivors were younger than the passengers who died. note- acceptance of alternate hypothesis(H2) does not mean that alternate hypothesis is proved to be true. it says that in the wake of given data there is no point of questioning the validity of alternative hypothesis.