R Markdown

This is an R Markdown document. Markdown is a simple formatting syntax for authoring HTML, PDF, and MS Word documents. For more details on using R Markdown see http://rmarkdown.rstudio.com.

When you click the Knit button a document will be generated that includes both content as well as the output of any embedded R code chunks within the document. You can embed an R code chunk like this:

summary(cars)
##      speed           dist       
##  Min.   : 4.0   Min.   :  2.00  
##  1st Qu.:12.0   1st Qu.: 26.00  
##  Median :15.0   Median : 36.00  
##  Mean   :15.4   Mean   : 42.98  
##  3rd Qu.:19.0   3rd Qu.: 56.00  
##  Max.   :25.0   Max.   :120.00

Including Plots

You can also embed plots, for example:

#TASK 2b - Reading the dataset

#Read the data and store it in a dataframe "titanic.df"
titanic.df<-read.csv(paste("Titanic Data.csv",sep=""))
View(titanic.df)

#TASK 3a Use R to count the total number of passengers on board the Titanic.
titanic.df<-read.csv(paste("Titanic Data.csv",sep=""))
nrow(titanic.df) #Total No. of passengers aboard
## [1] 889
#TASK 3b Use R to count the number of passengers who survived the sinking of the Titanic.
table(titanic.df$Survived) #Total No. of passengers survived 
## 
##   0   1 
## 549 340
#TASK 3c Use R to measure the percentage of passengers who survived the sinking of the Titanic.
mytable<- with(titanic.df,table(Survived)) #Table of passengers surviving
prop.table(mytable)*100 #Percent of passengers surviving
## Survived
##        0        1 
## 61.75478 38.24522
#TASK 3d dUse R to count the number of first-class passengers who survived the sinking of the Titanic.
mytable <- xtabs(~Survived + Pclass, data=titanic.df) #No. of first class passengers surviving
mytable
##         Pclass
## Survived   1   2   3
##        0  80  97 372
##        1 134  87 119
#TASK 3e Use R to measure the percentage of first-class passengers who survived the sinking of the Titanic. 
prop.table(mytable,2)*100 #% of first class passengers surviving
##         Pclass
## Survived        1        2        3
##        0 37.38318 52.71739 75.76375
##        1 62.61682 47.28261 24.23625
#TASK 3f Use R to count the number of females from First-Class who survived the sinking of the Titanic
mytable <- xtabs(~Survived + Pclass + Sex, data=titanic.df) #No. of first class female passengers surviving
mytable
## , , Sex = female
## 
##         Pclass
## Survived   1   2   3
##        0   3   6  72
##        1  89  70  72
## 
## , , Sex = male
## 
##         Pclass
## Survived   1   2   3
##        0  77  91 300
##        1  45  17  47
#TASK 3g Use R to measure the percentage of survivors who were female
mytable<-xtabs(~Survived+Sex, data=titanic.df)#% of females passengers surviving
prop.table(mytable,2)*100
##         Sex
## Survived   female     male
##        0 25.96154 81.10919
##        1 74.03846 18.89081
#TASK 3i Run a Pearson's Chi-squared test to test the following hypothesis:

#Hypothesis:  The proportion of females onboard who survived the sinking of
#the Titanic was higher than the proportion of males onboard who survived the sinking of the Titanic.

mytable <- xtabs(~Survived + Sex, data=titanic.df) #No. of female passengers surviving
chisq.test(mytable)
## 
##  Pearson's Chi-squared test with Yates' continuity correction
## 
## data:  mytable
## X-squared = 258.43, df = 1, p-value < 2.2e-16
#The p value for the Pearson's Chi-squared test is 2.2 e-16, which is <0.05. So, we can reject our null
#hypothesis that the sex of a person and his/her survival are independent of each other.
  
#Conclusion
#The proportion of females on board who survived the sinking of the Titanic was higher than the 
#proportion of males on board who survived the sinking of the Titanic.

Note that the echo = FALSE parameter was added to the code chunk to prevent printing of the R code that generated the plot.