This is an R Markdown document. Markdown is a simple formatting syntax for authoring HTML, PDF, and MS Word documents. For more details on using R Markdown see http://rmarkdown.rstudio.com.
When you click the Knit button a document will be generated that includes both content as well as the output of any embedded R code chunks within the document.
2b. First we need to import the Titanic.csv file into R dataframe:
setwd("F:/Data Analytics for Managerial Applications")
titanic.df <- read.csv(paste("Titanic Data.csv", sep = ""))
View(titanic.df)
Note that the echo = FALSE parameter was added to the code chunk to prevent printing of the R code that generated the plot.
3a. To calculate the number of passengers on board the titanic:
nrow(titanic.df)
## [1] 889
Therefore, total number of passengers is 889.
3b. To calculate the number of passengers who survived sinking on the titanic:
ppl_survived <- with(titanic.df,table(Survived))
ppl_survived
## Survived
## 0 1
## 549 340
Therefore, as count of value 1 is 340 times, 340 people survived the sinking.
3c. To calculate the percentage of passengers who survived the sinking of the titanic:
prop.table(ppl_survived)*100
## Survived
## 0 1
## 61.75478 38.24522
Therefore, percentage of people who survived = percentage of value 1 = 38.25%
3d. To calculate the number of 1st class passengers who survived the sinking of the titanic:
first_class_survived <- xtabs(~ Pclass+Survived, data = titanic.df)
first_class_survived
## Survived
## Pclass 0 1
## 1 80 134
## 2 97 87
## 3 372 119
Therefore, as evident from the results, No. of first class survivors = Count for Pclass = 1 and Survived = 1 is 134.
3e. To calculate the percentage of 1st class passengers who survived the sinking of the titanic:
prop.table(first_class_survived,1)*100
## Survived
## Pclass 0 1
## 1 37.38318 62.61682
## 2 52.71739 47.28261
## 3 75.76375 24.23625
Therefore, as evident from the results, the %age of first class passengers who survived the sinking of the titanic is 62.62%
3f. To calculate the number of females from first class who survived the sinking of the titanic:
first_female_survivors <- xtabs(~ Pclass+Sex+Survived, titanic.df)
ftable(first_female_survivors)
## Survived 0 1
## Pclass Sex
## 1 female 3 89
## male 77 45
## 2 female 6 70
## male 91 17
## 3 female 72 72
## male 300 47
Therefore, as evident from the results, the number of females from first class who survived the sinking of the titanic = 89
3g. To calculate the percentage of survivors who were female:
fem_survived <- xtabs(~ Survived + Sex, data = titanic.df) ## this stores the proportion of survivors by male and female
prop.table(fem_survived,1)*100
## Sex
## Survived female male
## 0 14.75410 85.24590
## 1 67.94118 32.05882
Therefore, as evident from the results, the percentage of survivors who were female is 67.94%
3h. To calculate the percentage of females on board who survived, we just apply prop.table() column wise:
prop.table(fem_survived,2)*100
## Sex
## Survived female male
## 0 25.96154 81.10919
## 1 74.03846 18.89081
Therefore, as evident from the results, the percentage of females onboard who survived is 74.04%
3i. To run a Chi-squared test to test the hypothesis that the proportion of females onboard who survived the sinking is greater than the proportion of males onboard who survived the sinking.
chisq.test(fem_survived)
##
## Pearson's Chi-squared test with Yates' continuity correction
##
## data: fem_survived
## X-squared = 258.43, df = 1, p-value < 2.2e-16
As we know, the p-values are the probability of obtaining the sampled results, assuming independence of the row and column variables i.e. survived and sex variables in the population.
Therefore, as evident from above, since the p-value is very very less than 0.05 and also less than 0.01, we can reject the null hypothesis that the proportions of female survivors and male survivors are independent.