Imported the ggplot2 package to use in my analysis to show the charts in HTML

library(ggplot2)

Downloaded the dataset and imported it into R.

titanic = read.csv("train.csv")

Perfect the dataset to tidy it up.

titanic$Survived= as.factor(titanic$Survived)
titanic$Pclass= as.factor(titanic$Pclass)
titanic$Sex= as.factor(titanic$Sex)
titanic$Embarked = as.factor(titanic$Embarked)

Now I’m going to look at the amount of passengers that survived and didn’t survive

ggplot(titanic, aes(x=Survived)) + theme_bw() + geom_bar(fill="#FE3276")+ labs(y="Passenger Count", x= "Titanic Survival")

prop.table(table(titanic$Survived))
## 
##         0         1 
## 0.6161616 0.3838384

The prop table shows us the percentages of passengars that survived and didn’t survive. About 62% of passengers did not survive the sinking of the Titanic Now we can analyze the data by sex to further breakdown the data to see if men or women had a higher chance of surviving

ggplot(titanic, aes(x= Sex, fill= Survived))+ theme_bw() + geom_bar() +labs(y="Passenger Count", x="Titantic Survival by Sex")

This data shows us that mostly females survived the sinking of the Titanic. Now I’m going to break down the data into Socio-economic class and then compare this with the previous data to see what gender and socioeconomic class was most likely to survive the Titanic.

ggplot(titanic, aes(x= Pclass, fill = Survived))+theme_bw() + geom_bar() +labs(y="Passenger Count", x="Titanic Survival by Ticket Class")

From the above chart we see that Class 3 or Low Class contains the majority of passengers that died in the Titantic. The graph makes it look like more Class 1 or First Class passengers died than Class 2, but there are less Class 2 passengers and the percentage of Class 2 members that did not survive is more, as shown in the table below

prop.table(table(titanic$Survived, titanic$Pclass))
##    
##              1          2          3
##   0 0.08978676 0.10886644 0.41750842
##   1 0.15263749 0.09764310 0.13355780

Now we can use both breakdowns of the Titanic data above and combine it to see what gender and socioeconomic class was most likely to survive the sinking of the Titanic.

ggplot(titanic, aes(x= Sex, fill= Survived)) + theme_bw() + facet_wrap(titanic$Pclass) +geom_bar() +labs(y= "Passenger Count", x= "Titanic Survival by Sex and Class")

From this barchart we can see that males in Class 3 were most likely to die and females in Class 1 were most likely to live. We can now make a mosiac plot to analyze and show this data in a different way

packages= c('vcd', 'vcdExtra', 'tidyverse')
for (p in packages) {
  if(!require(p, character.only = T)){
    install.packages(p)
  }
  library(p, character.only = T)
 
}
## Loading required package: vcd
## Loading required package: grid
## Loading required package: vcdExtra
## Loading required package: gnm
## Loading required package: tidyverse
## ── Attaching packages ─────────────────────────────────────── tidyverse 1.3.1 ──
## ✓ tibble  3.1.2     ✓ dplyr   1.0.7
## ✓ tidyr   1.1.3     ✓ stringr 1.4.0
## ✓ readr   1.4.0     ✓ forcats 0.5.1
## ✓ purrr   0.3.4
## ── Conflicts ────────────────────────────────────────── tidyverse_conflicts() ──
## x dplyr::filter()    masks stats::filter()
## x dplyr::lag()       masks stats::lag()
## x dplyr::summarise() masks vcdExtra::summarise()
mosaic(~Survived + Sex + Pclass, data = titanic, main= "Survival on the Titanic", shade=TRUE, legend=TRUE)

This mosaic plot shows us in a different way that the majority of females and males in Class 3 did not survive the Titantic, but it also shows us that the best rate of surviving the Titanic disaster is if you were a Female who was in First Class.We can also verify this by producing a table that shows us the percentages of each class gender that did not survive. In the table below we can see that females in class 1 had the highest percentage of survival compared with any other gender and class.

prop.table(table(titanic$Survived, titanic$Pclass, titanic$Sex))
## , ,  = female
## 
##    
##               1           2           3
##   0 0.003367003 0.006734007 0.080808081
##   1 0.102132435 0.078563412 0.080808081
## 
## , ,  = male
## 
##    
##               1           2           3
##   0 0.086419753 0.102132435 0.336700337
##   1 0.050505051 0.019079686 0.052749719