This is a report of the Titanic Assignment.
This task involves reading of the Titanic dataset into R. The structure of the dataset, along with the first few rows are as follows :
setwd("~/Muyeena/Internship/Titanic")
titanic = read.csv("Titanic Data.csv")
#View(titanic)
str(titanic)
## 'data.frame': 889 obs. of 8 variables:
## $ Survived: int 0 1 1 1 0 0 0 0 1 1 ...
## $ Pclass : int 3 1 3 1 3 3 1 3 3 2 ...
## $ Sex : Factor w/ 2 levels "female","male": 2 1 1 1 2 2 2 2 1 1 ...
## $ Age : num 22 38 26 35 35 29.7 54 2 27 14 ...
## $ SibSp : int 1 1 0 1 0 0 0 3 0 1 ...
## $ Parch : int 0 0 0 0 0 0 0 1 2 0 ...
## $ Fare : num 7.25 71.28 7.92 53.1 8.05 ...
## $ Embarked: Factor w/ 3 levels "C","Q","S": 3 1 3 3 3 2 3 3 3 1 ...
head(titanic)
## Survived Pclass Sex Age SibSp Parch Fare Embarked
## 1 0 3 male 22.0 1 0 7.2500 S
## 2 1 1 female 38.0 1 0 71.2833 C
## 3 1 3 female 26.0 0 0 7.9250 S
## 4 1 1 female 35.0 1 0 53.1000 S
## 5 0 3 male 35.0 0 0 8.0500 S
## 6 0 3 male 29.7 0 0 8.4583 Q
This task involves finding the total number of passengers in R. I have used a simple command to calculate the number of rows. The total number of rows can also be found using the describe function.
n = NROW(titanic)
n
## [1] 889
One can also use the describe function, to get a general idea of all the columns in the data, including the total number of observations - n (in this case, the total number of passengers)
library(psych)
describe(titanic)
## vars n mean sd median trimmed mad min max range
## Survived 1 889 0.38 0.49 0.00 0.35 0.00 0.0 1.00 1.00
## Pclass 2 889 2.31 0.83 3.00 2.39 0.00 1.0 3.00 2.00
## Sex* 3 889 1.65 0.48 2.00 1.69 0.00 1.0 2.00 1.00
## Age 4 889 29.65 12.97 29.70 29.22 9.34 0.4 80.00 79.60
## SibSp 5 889 0.52 1.10 0.00 0.27 0.00 0.0 8.00 8.00
## Parch 6 889 0.38 0.81 0.00 0.19 0.00 0.0 6.00 6.00
## Fare 7 889 32.10 49.70 14.45 21.28 10.24 0.0 512.33 512.33
## Embarked* 8 889 2.54 0.79 3.00 2.67 0.00 1.0 3.00 2.00
## skew kurtosis se
## Survived 0.48 -1.77 0.02
## Pclass -0.63 -1.27 0.03
## Sex* -0.62 -1.61 0.02
## Age 0.43 0.96 0.43
## SibSp 3.68 17.69 0.04
## Parch 2.74 9.66 0.03
## Fare 4.79 33.23 1.67
## Embarked* -1.26 -0.23 0.03
Therefore, from both the cases we have Total Number of passengers onboard the titanic are 889
For this task, we need to generate a frequency table for the column Survived. The same can be generated using the table command.
freq.sur = table(titanic$Survived)
freq.sur
##
## 0 1
## 549 340
As seen, The total number of passengers who survived the sinking of the titanic is 340
This task involves calculating the percentages of the frequency table obtained above. It can be achieved by using the prop.table command.
p.freq.sur = prop.table(freq.sur)*100
p.freq.sur
##
## 0 1
## 61.75478 38.24522
The percentage of passengers who survived is 38.2452193%
This value can be found by making a two-way table, using the xtabs command.
class.sur = xtabs(~ Pclass + Survived, data=titanic)
class.sur
## Survived
## Pclass 0 1
## 1 80 134
## 2 97 87
## 3 372 119
Therefore, The number of first class passengers who survived are 134
For this task, we need to calculate the row proportions for the above mentioned table. This can be achieved by using the prop.table command with the argument 1.
p.class.sur = prop.table(class.sur, 1)*100
p.class.sur
## Survived
## Pclass 0 1
## 1 37.38318 62.61682
## 2 52.71739 47.28261
## 3 75.76375 24.23625
Therefore, 62.6168224% of first-class passengers survived.
This task involves usage of three-way contingency tables. We use the ftables command to display the three way table in a compact format.
sex.class.sur = xtabs(~ Pclass + Sex + Survived, data=titanic)
sex.class.sur = ftable(sex.class.sur)
sex.class.sur
## Survived 0 1
## Pclass Sex
## 1 female 3 89
## male 77 45
## 2 female 6 70
## male 91 17
## 3 female 72 72
## male 300 47
From the above table, it is seen that 89 number of females, from first class survived.
The above task, requires the following steps :
sex.sur = xtabs(~ Sex + Survived, data = titanic)
sex.sur
## Survived
## Sex 0 1
## female 81 231
## male 468 109
p.sex.sur = prop.table(sex.sur, 2)*100 ##The argument 2 is given to ensure column wise percentages
p.sex.sur
## Survived
## Sex 0 1
## female 14.75410 67.94118
## male 85.24590 32.05882
Therefore The percentage of survivors who were female is 67.9411765%
For this task, we use the same table as above, but calculate the percentage row wise.
sex.sur.p = prop.table(sex.sur, 1)*100
sex.sur.p
## Survived
## Sex 0 1
## female 25.96154 74.03846
## male 81.10919 18.89081
Therefore The percentage of females who survived is 74.0384615%
The following hypothesis needs to be tested using the Pearson’s Chi Squared Test :
The proportion of females onboard who survived the sinking of the Titanic was higher than the proportion of males onboard who survived the sinking of the Titanic
We use the following steps :
chi.table = xtabs(~ Survived + Sex, data=titanic)
addmargins(chi.table)
## Sex
## Survived female male Sum
## 0 81 468 549
## 1 231 109 340
## Sum 312 577 889
chisq.test(chi.table)
##
## Pearson's Chi-squared test with Yates' continuity correction
##
## data: chi.table
## X-squared = 258.43, df = 1, p-value < 2.2e-16
As the p-value is very small p<0.01, it suggests a relationship between the gender of the person, and their chances of surival.
Therefore, our hypothesis holds true