This file contains the analysis done on Titanic dataset. The dataset contains the following columns: survival: Survival 0 = No, 1 = Yes pclass: Ticket class 1 = 1st, 2 = 2nd, 3 = 3rd sex: Sex Age: Age in years
sibsp # of siblings / spouses aboard the Titanic
parch # of parents / children aboard the Titanic
ticket: Ticket number
fare: Passenger fare
cabin: Cabin number
embarked: Port of Embarkation C = Cherbourg, Q = Queenstown, S = Southampton
To start the analysis, I first began by loading the ggplot2 package in RStudio. This package is system for declaratively creating graphics, based on The Grammar of Graphics. I have provided the data, ggplot2 has done the following analysis by knowing how to map variables to aesthetics, what graphical primitives to use, and it takes care of the details.
getwd()
## [1] "E:/All C drive/Desktop"
library(ggplot2)
## Warning: package 'ggplot2' was built under R version 4.1.3
train <- read.csv("train.csv", stringsAsFactors = FALSE)
View(train)
train$Pclass <- as.factor(train$Pclass)
train$Survived <- as.factor(train$Survived)
train$Sex <- as.factor(train$Sex)
train$Ticket <- as.factor(train$Ticket)
#plot no of survived
ggplot(train, aes(x = Survived)) +
geom_bar()
#percentages
prop.table(table(train$Survived))
##
## 0 1
## 0.6161616 0.3838384
# adding customization
ggplot(train, aes(x = Survived)) +
theme_bw()+
geom_bar()+
labs(x = "Survival rate on 0 and 1",
y = "Passenger count",
title = "Titanic survival rates")
# survival rate by gender - looking at two aspects
ggplot(train, aes(x = Sex, fill = Survived)) +
theme_bw()+
geom_bar()+
labs(y = "Passenger count",
title = "Titanic survival rates by sex")
# survival rate by tickets - looking at two aspects
ggplot(train, aes(x = Pclass, fill = Survived)) +
theme_bw()+
geom_bar()+
labs(y = "Passenger count",
title = "Titanic survival rates by Pclass")
# survival rate by tickets and gender
ggplot(train, aes(x = Sex, fill = Survived)) +
theme_bw()+
facet_wrap(~ Pclass)+
geom_bar()+
labs(y = "Passenger count",
title = "Titanic survival rates by Pclass and sex")
prop.table(table(train$Survived, train$Pclass))
##
## 1 2 3
## 0 0.08978676 0.10886644 0.41750842
## 1 0.15263749 0.09764310 0.13355780
prop.table(table(train$Survived, train$Pclass, train$Sex))
## , , = female
##
##
## 1 2 3
## 0 0.003367003 0.006734007 0.080808081
## 1 0.102132435 0.078563412 0.080808081
##
## , , = male
##
##
## 1 2 3
## 0 0.086419753 0.102132435 0.336700337
## 1 0.050505051 0.019079686 0.052749719
# distribution of passenger ages - histogram
ggplot(train, aes(x = Age)) +
theme_bw()+
geom_histogram(binwidth = 5)+
labs(y = "Passenger count",
x = "Age",
title = "Titanic Age distribution")
## Warning: Removed 177 rows containing non-finite values (stat_bin).
ggplot(train, aes(x = Age, fill = Survived)) +
theme_bw()+
geom_histogram(binwidth = 5)+
labs(y = "Passenger count",
x = "Age",
title = "Titanic survival rates by age")
## Warning: Removed 177 rows containing non-finite values (stat_bin).
# box and whisker plot
ggplot(train, aes(x = Survived, y = Age)) +
theme_bw()+
geom_boxplot()+
labs(y = "Age",
x = "Survived",
title = "Titanic survival rates by age")
## Warning: Removed 177 rows containing non-finite values (stat_boxplot).
# survival rates by age when segmented by gender and class ticket
# visualization - density plot
ggplot(train, aes(x = Age, fill = Survived)) +
theme_bw()+
facet_wrap(Sex ~ Pclass)+
geom_density(alpha = 0.5)+
labs(y = "Age",
x = "Survived",
title = "Titanic survival rates by age, Pclass, Sex")
## Warning: Removed 177 rows containing non-finite values (stat_density).
# visualization - histogram
ggplot(train, aes(x = Age, fill = Survived)) +
theme_bw()+
facet_wrap(Sex ~ Pclass)+
geom_histogram()+
labs(y = "Age",
x = "Survived",
title = "Titanic survival rates by age, Pclass, Sex")
## `stat_bin()` using `bins = 30`. Pick better value with `binwidth`.
## Warning: Removed 177 rows containing non-finite values (stat_bin).
## The Analysis is as follows: 1. Among Male and Female, Female had higher surviving rate. 2. 61% people died whereas 38% has survived in titanic disaster. 3. Third class passengers has died the most among the other two class passengers. 4. Male third class passengers had more death rate than among all the genders and pclass. 5. In second class, male death rate is more than female deathe rate. 6. Highest survival rate is among female first class passengers.