This is an R Markdown document. Markdown is a simple formatting syntax for authoring HTML, PDF, and MS Word documents. For more details on using R Markdown see http://rmarkdown.rstudio.com.
When you click the Knit button a document will be generated that includes both content as well as the output of any embedded R code chunks within the document. You can embed an R code chunk like this:
summary(cars)
## speed dist
## Min. : 4.0 Min. : 2.00
## 1st Qu.:12.0 1st Qu.: 26.00
## Median :15.0 Median : 36.00
## Mean :15.4 Mean : 42.98
## 3rd Qu.:19.0 3rd Qu.: 56.00
## Max. :25.0 Max. :120.00
You can also embed plots, for example:
#TASK 2b - Reading the dataset
#Read the data and store it in a dataframe "titanic.df"
titanic.df<-read.csv(paste("Titanic Data.csv",sep=""))
View(titanic.df)
#TASK 3a Use R to count the total number of passengers on board the Titanic.
titanic.df<-read.csv(paste("Titanic Data.csv",sep=""))
nrow(titanic.df) #Total No. of passengers aboard
## [1] 889
#TASK 3b Use R to count the number of passengers who survived the sinking of the Titanic.
table(titanic.df$Survived) #Total No. of passengers survived
##
## 0 1
## 549 340
#TASK 3c Use R to measure the percentage of passengers who survived the sinking of the Titanic.
mytable<- with(titanic.df,table(Survived)) #Table of passengers surviving
prop.table(mytable)*100 #Percent of passengers surviving
## Survived
## 0 1
## 61.75478 38.24522
#TASK 3d dUse R to count the number of first-class passengers who survived the sinking of the Titanic.
mytable <- xtabs(~Survived + Pclass, data=titanic.df) #No. of first class passengers surviving
mytable
## Pclass
## Survived 1 2 3
## 0 80 97 372
## 1 134 87 119
#TASK 3e Use R to measure the percentage of first-class passengers who survived the sinking of the Titanic.
prop.table(mytable,2)*100 #% of first class passengers surviving
## Pclass
## Survived 1 2 3
## 0 37.38318 52.71739 75.76375
## 1 62.61682 47.28261 24.23625
#TASK 3f Use R to count the number of females from First-Class who survived the sinking of the Titanic
mytable <- xtabs(~Survived + Pclass + Sex, data=titanic.df) #No. of first class female passengers surviving
mytable
## , , Sex = female
##
## Pclass
## Survived 1 2 3
## 0 3 6 72
## 1 89 70 72
##
## , , Sex = male
##
## Pclass
## Survived 1 2 3
## 0 77 91 300
## 1 45 17 47
#TASK 3g Use R to measure the percentage of survivors who were female
mytable<-xtabs(~Survived+Sex, data=titanic.df)#% of females passengers surviving
prop.table(mytable,2)*100
## Sex
## Survived female male
## 0 25.96154 81.10919
## 1 74.03846 18.89081
#TASK 3i Run a Pearson's Chi-squared test to test the following hypothesis:
#Hypothesis: The proportion of females onboard who survived the sinking of
#the Titanic was higher than the proportion of males onboard who survived the sinking of the Titanic.
mytable <- xtabs(~Survived + Sex, data=titanic.df) #No. of female passengers surviving
chisq.test(mytable)
##
## Pearson's Chi-squared test with Yates' continuity correction
##
## data: mytable
## X-squared = 258.43, df = 1, p-value < 2.2e-16
#The p value for the Pearson's Chi-squared test is 2.2 e-16, which is <0.05. So, we can reject our null
#hypothesis that the sex of a person and his/her survival are independent of each other.
#Conclusion
#The proportion of females on board who survived the sinking of the Titanic was higher than the
#proportion of males on board who survived the sinking of the Titanic.
Note that the echo = FALSE parameter was added to the code chunk to prevent printing of the R code that generated the plot.