R Markdown

This is a report of the Titanic Assignment.

Task 2B:Reading the dataset

This task involves reading of the Titanic dataset into R. The structure of the dataset, along with the first few rows are as follows :

setwd("~/Muyeena/Internship/Titanic")
titanic = read.csv("Titanic Data.csv")
#View(titanic)
str(titanic)
## 'data.frame':    889 obs. of  8 variables:
##  $ Survived: int  0 1 1 1 0 0 0 0 1 1 ...
##  $ Pclass  : int  3 1 3 1 3 3 1 3 3 2 ...
##  $ Sex     : Factor w/ 2 levels "female","male": 2 1 1 1 2 2 2 2 1 1 ...
##  $ Age     : num  22 38 26 35 35 29.7 54 2 27 14 ...
##  $ SibSp   : int  1 1 0 1 0 0 0 3 0 1 ...
##  $ Parch   : int  0 0 0 0 0 0 0 1 2 0 ...
##  $ Fare    : num  7.25 71.28 7.92 53.1 8.05 ...
##  $ Embarked: Factor w/ 3 levels "C","Q","S": 3 1 3 3 3 2 3 3 3 1 ...
head(titanic)
##   Survived Pclass    Sex  Age SibSp Parch    Fare Embarked
## 1        0      3   male 22.0     1     0  7.2500        S
## 2        1      1 female 38.0     1     0 71.2833        C
## 3        1      3 female 26.0     0     0  7.9250        S
## 4        1      1 female 35.0     1     0 53.1000        S
## 5        0      3   male 35.0     0     0  8.0500        S
## 6        0      3   male 29.7     0     0  8.4583        Q

Task 3a : Use R to count the total number of passengers on board the Titanic.

This task involves finding the total number of passengers in R. I have used a simple command to calculate the number of rows. The total number of rows can also be found using the describe function.

n = NROW(titanic)
n
## [1] 889

One can also use the describe function, to get a general idea of all the columns in the data, including the total number of observations - n (in this case, the total number of passengers)

library(psych)
describe(titanic)
##           vars   n  mean    sd median trimmed   mad min    max  range
## Survived     1 889  0.38  0.49   0.00    0.35  0.00 0.0   1.00   1.00
## Pclass       2 889  2.31  0.83   3.00    2.39  0.00 1.0   3.00   2.00
## Sex*         3 889  1.65  0.48   2.00    1.69  0.00 1.0   2.00   1.00
## Age          4 889 29.65 12.97  29.70   29.22  9.34 0.4  80.00  79.60
## SibSp        5 889  0.52  1.10   0.00    0.27  0.00 0.0   8.00   8.00
## Parch        6 889  0.38  0.81   0.00    0.19  0.00 0.0   6.00   6.00
## Fare         7 889 32.10 49.70  14.45   21.28 10.24 0.0 512.33 512.33
## Embarked*    8 889  2.54  0.79   3.00    2.67  0.00 1.0   3.00   2.00
##            skew kurtosis   se
## Survived   0.48    -1.77 0.02
## Pclass    -0.63    -1.27 0.03
## Sex*      -0.62    -1.61 0.02
## Age        0.43     0.96 0.43
## SibSp      3.68    17.69 0.04
## Parch      2.74     9.66 0.03
## Fare       4.79    33.23 1.67
## Embarked* -1.26    -0.23 0.03

Therefore, from both the cases we have Total Number of passengers onboard the titanic are 889


Task 3b : Use R to count the number of passengers who survived the sinking of the Titanic

For this task, we need to generate a frequency table for the column Survived. The same can be generated using the table command.

freq.sur = table(titanic$Survived)
freq.sur
## 
##   0   1 
## 549 340

As seen, The total number of passengers who survived the sinking of the titanic is 340


Task 3c : Use R to measure the percentage of passengers who survived the sinking of the Titanic.

This task involves calculating the percentages of the frequency table obtained above. It can be achieved by using the prop.table command.

p.freq.sur = prop.table(freq.sur)*100
p.freq.sur
## 
##        0        1 
## 61.75478 38.24522

The percentage of passengers who survived is 38.2452193%


Task 3d : Use R to count the number of first-class passengers who survived the sinking of the Titanic.

This value can be found by making a two-way table, using the xtabs command.

class.sur = xtabs(~ Pclass + Survived, data=titanic)
class.sur
##       Survived
## Pclass   0   1
##      1  80 134
##      2  97  87
##      3 372 119

Therefore, The number of first class passengers who survived are 134


Task 3e : Use R to measure the percentage of first-class passengers who survived the sinking of the Titanic.

For this task, we need to calculate the row proportions for the above mentioned table. This can be achieved by using the prop.table command with the argument 1.

p.class.sur = prop.table(class.sur, 1)*100
p.class.sur
##       Survived
## Pclass        0        1
##      1 37.38318 62.61682
##      2 52.71739 47.28261
##      3 75.76375 24.23625

Therefore, 62.6168224% of first-class passengers survived.


Task 3f : Use R to count the number of females from First-Class who survived the sinking of the Titanic

This task involves usage of three-way contingency tables. We use the ftables command to display the three way table in a compact format.

sex.class.sur = xtabs(~ Pclass + Sex + Survived, data=titanic)
sex.class.sur = ftable(sex.class.sur)
sex.class.sur
##               Survived   0   1
## Pclass Sex                    
## 1      female            3  89
##        male             77  45
## 2      female            6  70
##        male             91  17
## 3      female           72  72
##        male            300  47

From the above table, it is seen that 89 number of females, from first class survived.


Task 3g : Use R to measure the percentage of survivors who were female

The above task, requires the following steps :

  1. Form a two-way table between “Sex” and “Survived” columns.
  2. Calculate the percentages of the two way table, according to the column
sex.sur = xtabs(~ Sex + Survived, data = titanic)
sex.sur
##         Survived
## Sex        0   1
##   female  81 231
##   male   468 109
p.sex.sur = prop.table(sex.sur, 2)*100 ##The argument 2 is given to ensure column wise percentages
p.sex.sur
##         Survived
## Sex             0        1
##   female 14.75410 67.94118
##   male   85.24590 32.05882

Therefore The percentage of survivors who were female is 67.9411765%


Task 3h : Use R to measure the percentage of females on board the Titanic who survived

For this task, we use the same table as above, but calculate the percentage row wise.

sex.sur.p = prop.table(sex.sur, 1)*100
sex.sur.p
##         Survived
## Sex             0        1
##   female 25.96154 74.03846
##   male   81.10919 18.89081

Therefore The percentage of females who survived is 74.0384615%


Task 3i : Run a Pearson’s Chi-squared test

The following hypothesis needs to be tested using the Pearson’s Chi Squared Test :

The proportion of females onboard who survived the sinking of the Titanic was higher than the proportion of males onboard who survived the sinking of the Titanic

We use the following steps :

  • Construct a two-way contingency table between Sex and Survived Columns
  • Run the chisq.test function on the table
  • See the p-value obtained, and come to a conclusion.
chi.table = xtabs(~ Survived + Sex, data=titanic)
addmargins(chi.table)
##         Sex
## Survived female male Sum
##      0       81  468 549
##      1      231  109 340
##      Sum    312  577 889
chisq.test(chi.table)
## 
##  Pearson's Chi-squared test with Yates' continuity correction
## 
## data:  chi.table
## X-squared = 258.43, df = 1, p-value < 2.2e-16

As the p-value is very small p<0.01, it suggests a relationship between the gender of the person, and their chances of surival.

Therefore, our hypothesis holds true