Now that we have our data sets set up let’s take a look at them and start to analyze what happened. (And learn some R basics along the way.)
Let’s focus on the Titanic.
Start by loading the dataset that you previously saved.
load(paste("~","data/SS Vestris.Rda", sep="/"))
The data we downloaded contains 1 row per person on the ship for a given ship, in this case RMS Titanic.
Find some basic information about the SS Vestris using the str(SS_Vestris) function and the summary(SS_Vestris) function on the SS Vestris data set. That means, put the name of the data set inside the parentheses.
str(SS_Vestris)
## Classes 'tbl_df', 'tbl' and 'data.frame': 328 obs. of 20 variables:
## $ Id_12 : num 1 2 3 4 5 6 7 8 9 10 ...
## $ Ship Id : num 12 12 12 12 12 12 12 12 12 12 ...
## $ Year : num 1928 1928 1928 1928 1928 ...
## $ Nationality of the Ship : chr "U.K" "U.K" "U.K" "U.K" ...
## $ Women and children first : num 0 0 0 0 0 0 0 0 0 0 ...
## $ Quick : num 0 0 0 0 0 0 0 0 0 0 ...
## $ Cause : chr "Weather" "Weather" "Weather" "Weather" ...
## $ No. of passengers : num 113 113 113 113 113 113 113 113 113 113 ...
## $ No. of women passengers : num 38 38 38 38 38 38 38 38 38 38 ...
## $ Women passengers/passengers: num 0.336 0.336 0.336 0.336 0.336 0.336 0.336 0.336 0.336 0.336 ...
## $ Ship size : num 308 308 308 308 308 308 308 308 308 308 ...
## $ Length of voyage : num 2 2 2 2 2 2 2 2 2 2 ...
## $ Gender : num NA NA 0 0 1 0 0 0 1 0 ...
## $ Age : num NA NA NA NA NA NA NA NA NA NA ...
## $ Child : num 0 0 0 0 0 0 0 0 0 0 ...
## $ Crew : num 0 0 0 0 0 0 0 0 0 0 ...
## $ Passenger Class : num NA NA NA NA NA NA NA NA NA NA ...
## $ Nationality of Passenger : num NA NA NA NA NA NA NA NA NA NA ...
## $ Companionship : num NA NA NA NA NA NA NA NA NA NA ...
## $ Survival : num 1 1 1 1 1 1 1 1 1 1 ...
summary(SS_Vestris)
## Id_12 Ship Id Year Nationality of the Ship
## Min. : 1.00 Min. :12 Min. :1928 Length:328
## 1st Qu.: 82.75 1st Qu.:12 1st Qu.:1928 Class :character
## Median :164.50 Median :12 Median :1928 Mode :character
## Mean :164.50 Mean :12 Mean :1928
## 3rd Qu.:246.25 3rd Qu.:12 3rd Qu.:1928
## Max. :328.00 Max. :12 Max. :1928
##
## Women and children first Quick Cause No. of passengers
## Min. :0 Min. :0 Length:328 Min. :113
## 1st Qu.:0 1st Qu.:0 Class :character 1st Qu.:113
## Median :0 Median :0 Mode :character Median :113
## Mean :0 Mean :0 Mean :113
## 3rd Qu.:0 3rd Qu.:0 3rd Qu.:113
## Max. :0 Max. :0 Max. :113
##
## No. of women passengers Women passengers/passengers Ship size
## Min. :38 Min. :0.336 Min. :308
## 1st Qu.:38 1st Qu.:0.336 1st Qu.:308
## Median :38 Median :0.336 Median :308
## Mean :38 Mean :0.336 Mean :308
## 3rd Qu.:38 3rd Qu.:0.336 3rd Qu.:308
## Max. :38 Max. :0.336 Max. :308
##
## Length of voyage Gender Age Child
## Min. :2 Min. :0.0000 Min. : NA Min. :0.00000
## 1st Qu.:2 1st Qu.:0.0000 1st Qu.: NA 1st Qu.:0.00000
## Median :2 Median :0.0000 Median : NA Median :0.00000
## Mean :2 Mean :0.1331 Mean :NaN Mean :0.03659
## 3rd Qu.:2 3rd Qu.:0.0000 3rd Qu.: NA 3rd Qu.:0.00000
## Max. :2 Max. :1.0000 Max. : NA Max. :1.00000
## NA's :20 NA's :328
## Crew Passenger Class Nationality of Passenger Companionship
## Min. :0.0000 Min. : NA Min. : NA Min. : NA
## 1st Qu.:0.0000 1st Qu.: NA 1st Qu.: NA 1st Qu.: NA
## Median :1.0000 Median : NA Median : NA Median : NA
## Mean :0.5976 Mean :NaN Mean :NaN Mean :NaN
## 3rd Qu.:1.0000 3rd Qu.: NA 3rd Qu.: NA 3rd Qu.: NA
## Max. :1.0000 Max. : NA Max. : NA Max. : NA
## NA's :328 NA's :328 NA's :328
## Survival
## Min. :0.0000
## 1st Qu.:0.0000
## Median :1.0000
## Mean :0.6128
## 3rd Qu.:1.0000
## Max. :1.0000
##
You should notice some odd things.
We need to pay attention to the fact that there are some variables not available for all of the ships. When a variable is not available, all values will be missing.
Now let’s get some descrptive information about the people on the ship using the crosstab function. The variables we are interested in in the SS_Vestris data set are: Nationality,Passenger, Survival.
#Here is the first. You add the others.
crosstab(SS_Vestris, row.vars = "Gender")
##
## Gender Count Total %
## 0 267.00 86.69
## 1 41.00 13.31
## Sum 308.00 100.00
crosstab(SS_Vestris, row.vars = "Crew")
##
## Crew Count Total %
## 0 132.00 40.24
## 1 196.00 59.76
## Sum 328.00 100.00
crosstab(SS_Vestris, row.vars = "Survival")
##
## Survival Count Total %
## 0 127.00 38.72
## 1 201.00 61.28
## Sum 328.00 100.00
We would also like to know about the proportion who are 15 and under. The problem is that the Child variable is all missing and age is given in years. Let’s update the Child variable with information from the Age variable and then run the crosstab. NO AGE AND CHILD AVAILABLE (For your other ship you need to see if this is necessary and possible. You may need to do other similar things with other variables.)
Age Variable is not avaible
# Add the crosstab below
SS_Vestris$Child <- as.numeric(SS_Vestris$Age) <= 15 does. Age is N/AWrite a paragraph describing the people who were on board the SS Vestris and what happened to them. out of 113 passenger of the total between passengers (man, women and crew member), women survival was higher