R Markdown

This is an R Markdown document. Markdown is a simple formatting syntax for authoring HTML, PDF, and MS Word documents. For more details on using R Markdown see http://rmarkdown.rstudio.com.

When you click the Knit button a document will be generated that includes both content as well as the output of any embedded R code chunks within the document.

Reading Titanic Dataset

2b. First we need to import the Titanic.csv file into R dataframe:

setwd("F:/Data Analytics for Managerial Applications")
titanic.df <- read.csv(paste("Titanic Data.csv", sep = ""))
View(titanic.df)

Note that the echo = FALSE parameter was added to the code chunk to prevent printing of the R code that generated the plot.

Total number of passengers

3a. To calculate the number of passengers on board the titanic:

nrow(titanic.df)
## [1] 889

Therefore, total number of passengers is 889.

Total survivors

3b. To calculate the number of passengers who survived sinking on the titanic:

ppl_survived <- with(titanic.df,table(Survived))
ppl_survived
## Survived
##   0   1 
## 549 340

Therefore, as count of value 1 is 340 times, 340 people survived the sinking.

Percentage of survivors

3c. To calculate the percentage of passengers who survived the sinking of the titanic:

prop.table(ppl_survived)*100
## Survived
##        0        1 
## 61.75478 38.24522

Therefore, percentage of people who survived = percentage of value 1 = 38.25%

Number of 1st class survivors

3d. To calculate the number of 1st class passengers who survived the sinking of the titanic:

first_class_survived <- xtabs(~ Pclass+Survived, data = titanic.df)
first_class_survived
##       Survived
## Pclass   0   1
##      1  80 134
##      2  97  87
##      3 372 119

Therefore, as evident from the results, No. of first class survivors = Count for Pclass = 1 and Survived = 1 is 134.

Percentage of 1st class survivors

3e. To calculate the percentage of 1st class passengers who survived the sinking of the titanic:

prop.table(first_class_survived,1)*100
##       Survived
## Pclass        0        1
##      1 37.38318 62.61682
##      2 52.71739 47.28261
##      3 75.76375 24.23625

Therefore, as evident from the results, the %age of first class passengers who survived the sinking of the titanic is 62.62%

Percentage of 1st class Female survivors

3f. To calculate the number of females from first class who survived the sinking of the titanic:

first_female_survivors <- xtabs(~ Pclass+Sex+Survived, titanic.df)
ftable(first_female_survivors)
##               Survived   0   1
## Pclass Sex                    
## 1      female            3  89
##        male             77  45
## 2      female            6  70
##        male             91  17
## 3      female           72  72
##        male            300  47

Therefore, as evident from the results, the number of females from first class who survived the sinking of the titanic = 89

Percentage of survivors who were female

3g. To calculate the percentage of survivors who were female:

fem_survived <- xtabs(~ Survived + Sex, data = titanic.df) ## this stores the proportion of survivors by male and female
prop.table(fem_survived,1)*100
##         Sex
## Survived   female     male
##        0 14.75410 85.24590
##        1 67.94118 32.05882

Therefore, as evident from the results, the percentage of survivors who were female is 67.94%

Percentage of females on board who survived

3h. To calculate the percentage of females on board who survived, we just apply prop.table() column wise:

prop.table(fem_survived,2)*100
##         Sex
## Survived   female     male
##        0 25.96154 81.10919
##        1 74.03846 18.89081

Therefore, as evident from the results, the percentage of females onboard who survived is 74.04%

Run a Pearson’s Chi-Squared test

3i. To run a Chi-squared test to test the hypothesis that the proportion of females onboard who survived the sinking is greater than the proportion of males onboard who survived the sinking.

chisq.test(fem_survived)
## 
##  Pearson's Chi-squared test with Yates' continuity correction
## 
## data:  fem_survived
## X-squared = 258.43, df = 1, p-value < 2.2e-16

As we know, the p-values are the probability of obtaining the sampled results, assuming independence of the row and column variables i.e. survived and sex variables in the population.

Therefore, as evident from above, since the p-value is very very less than 0.05 and also less than 0.01, we can reject the null hypothesis that the proportions of female survivors and male survivors are independent.