Firstly, I have installed all the necessary, packages. Now, I am calling those libraries
library(devtools)
library(ggplot2)
library(statsr)
library(dplyr)
titanic <- read.csv("titanic.csv", stringsAsFactors = FALSE)
View(titanic)
titanic$Pclass <- as.factor(titanic$Pclass)
titanic$Survived <- as.factor(titanic$Survived)
titanic$Sex <- as.factor(titanic$Sex)
titanic$Embarked <- as.factor(titanic$Embarked)
summary(titanic)
## PassengerId Survived Pclass Name Sex
## Min. : 1.0 0:549 1:216 Length:891 female:314
## 1st Qu.:223.5 1:342 2:184 Class :character male :577
## Median :446.0 3:491 Mode :character
## Mean :446.0
## 3rd Qu.:668.5
## Max. :891.0
##
## Age SibSp Parch Ticket
## Min. : 0.42 Min. :0.000 Min. :0.0000 Length:891
## 1st Qu.:20.12 1st Qu.:0.000 1st Qu.:0.0000 Class :character
## Median :28.00 Median :0.000 Median :0.0000 Mode :character
## Mean :29.70 Mean :0.523 Mean :0.3816
## 3rd Qu.:38.00 3rd Qu.:1.000 3rd Qu.:0.0000
## Max. :80.00 Max. :8.000 Max. :6.0000
## NA's :177
## Fare Cabin Embarked
## Min. : 0.00 Length:891 : 2
## 1st Qu.: 7.91 Class :character C:168
## Median : 14.45 Mode :character Q: 77
## Mean : 32.20 S:644
## 3rd Qu.: 31.00
## Max. :512.33
##
ggplot(titanic, aes(x = Survived)) +
theme_bw() +
geom_bar() +
labs(y = "Passenger Count",
title = "Titanic Survival Rates")
Let us find out how many male and female were in the Survived category
ggplot(titanic, aes(x = Sex, fill = Survived)) +
theme_bw() +
geom_bar() +
labs(y = "Passenger Count",
title = "Titanic Survival Rates by sex")
The bar chart shows the relative proprtion of those who survived and who died by sex.
In female, more people survived whereas, among male, more people died.Apart from gender, may be the classs of tickets also played role
ggplot(titanic, aes(x = Pclass, fill = Survived)) +
theme_bw() +
geom_bar() +
labs(y = "Passenger Count",
title = "Titanic Survival Rates by class")
Peple in Class 3 has significantly poor survival ratio. On the other hand, people in Class 1 survived more. Interestingly, 2nd Class passengers’ survival rate is almost 50%.
ggplot(titanic, aes(x = Sex, fill = Survived)) +
theme_bw() +
facet_wrap(~ Pclass) +
geom_bar() +
labs(y = "Passenger Count",
title = "Titanic Survival Rates by class and sex")
There are three panels for each class and each panel has female and male. Based on the chart above, females on first class overwhelmingly survived which is the same case for second class. However, for third class, women had equal chances for survival and death. Sadly, men have the less survivality in al three classes.
Like gender and class, age has equal chances of affecting survivabiity I have set binwidth of five years.
ggplot(titanic, aes(x = Age)) +
theme_bw() +
geom_histogram(binwidth = 5) +
labs(y = "Passenger Count",
x = "Age with binwidth 5 years",
title = "Titanic passengers age distribution")
## Warning: Removed 177 rows containing non-finite values (stat_bin).
It can be noticed that people in their 20s are the majority among the passengers. There were also some elderly passengers upto some 80 years old as well as some children.
ggplot(titanic, aes(x = Age, fill = Survived)) +
theme_bw() +
geom_histogram(binwidth = 5) +
labs(y = "Passenger Count",
x = "Age with binwidth 5 years",
title = "Titanic passengers age distribution")
## Warning: Removed 177 rows containing non-finite values (stat_bin).
Children especially between 0 to 5 years old, had high survivability. However in the higher end on age distribution, the death rate is significantly high especially between 50 and 70 (Except one outlier of 80 years old passengers survived).
ggplot(titanic, aes(x = Age, fill = Survived)) +
theme_bw() +
facet_wrap(Sex ~ Pclass) +
geom_histogram(alpha = 0.5) +
labs(y = "Age",
x = "Survived",
title = "Survival Rates by Age, Pclass, and Sex")
## `stat_bin()` using `bins = 30`. Pick better value with `binwidth`.
## Warning: Removed 177 rows containing non-finite values (stat_bin).
Female in first class, they had extremely high survivability rate. Unexpectedly, a female died. That is almost same in second classs. However, female had less survivability who were in third class.
On the other hand, for male, survival rate for first class is highes followed by second class. The survival rate for third classs male was extremely low.Titanic data is available on Kaggle website.kaggle competitions https://www.kaggle.com/c/titanic/data