Now that we have our data sets set up let’s take a look at them and start to analyze what happened. (And learn some R basics along the way.)
Let’s focus on the SS Princess Alice.
Start by loading the dataset that you previously saved.
load("data/SS_Princess_Alice.Rda")
The data that was record on the characteristics are the same. The category entitled gender has some data that is not available. The data based on Passenger class, Nationality of Passenger, and Companionship are not available.
Find some basic information about the RMS_Titanic using the str() function and the summary() function on the SS_Princess_Alice data set. That means, put the name of the data set inside the parentheses.
str(SS_Princess_Alice)
## Classes 'tbl_df', 'tbl' and 'data.frame': 898 obs. of 20 variables:
## $ Id_6 : num 1 2 3 4 5 6 7 8 9 10 ...
## $ Ship Id : num 6 6 6 6 6 6 6 6 6 6 ...
## $ Year : num 1878 1878 1878 1878 1878 ...
## $ Nationality of the Ship : chr "U.K" "U.K" "U.K" "U.K" ...
## $ Women and children first : num 0 0 0 0 0 0 0 0 0 0 ...
## $ Quick : num 1 1 1 1 1 1 1 1 1 1 ...
## $ Cause : chr "Collision" "Collision" "Collision" "Collision" ...
## $ No. of passengers : num 801 801 801 801 801 801 801 801 801 801 ...
## $ No. of women passengers : num 455 455 455 455 455 455 455 455 455 455 ...
## $ Women passengers/passengers: num 0.568 0.568 0.568 0.568 0.568 0.568 0.568 0.568 0.568 0.568 ...
## $ Ship size : num 837 837 837 837 837 837 837 837 837 837 ...
## $ Length of voyage : num 1 1 1 1 1 1 1 1 1 1 ...
## $ Gender : num 0 1 NA NA 1 1 1 0 0 1 ...
## $ Age : num 29 33 NA NA 65 16 33 35 33 0 ...
## $ Child : num 0 0 0 0 0 0 0 0 0 0 ...
## $ Crew : num 0 0 0 0 0 0 0 0 0 0 ...
## $ Passenger Class : num NA NA NA NA NA NA NA NA NA NA ...
## $ Nationality of Passenger : num NA NA NA NA NA NA NA NA NA NA ...
## $ Companionship : num NA NA NA NA NA NA NA NA NA NA ...
## $ Survival : num 0 0 1 1 0 0 0 0 0 0 ...
summary(SS_Princess_Alice)
## Id_6 Ship Id Year Nationality of the Ship
## Min. : 1.0 Min. :6 Min. :1878 Length:898
## 1st Qu.:224.5 1st Qu.:6 1st Qu.:1878 Class :character
## Median :448.0 Median :6 Median :1878 Mode :character
## Mean :448.0 Mean :6 Mean :1878
## 3rd Qu.:671.5 3rd Qu.:6 3rd Qu.:1878
## Max. :895.0 Max. :6 Max. :1878
## NA's :3
## Women and children first Quick Cause No. of passengers
## Min. :0 Min. :1 Length:898 Min. :801
## 1st Qu.:0 1st Qu.:1 Class :character 1st Qu.:801
## Median :0 Median :1 Mode :character Median :801
## Mean :0 Mean :1 Mean :801
## 3rd Qu.:0 3rd Qu.:1 3rd Qu.:801
## Max. :0 Max. :1 Max. :801
##
## No. of women passengers Women passengers/passengers Ship size
## Min. :455 Min. :0.568 Min. :837
## 1st Qu.:455 1st Qu.:0.568 1st Qu.:837
## Median :455 Median :0.568 Median :837
## Mean :455 Mean :0.568 Mean :837
## 3rd Qu.:455 3rd Qu.:0.568 3rd Qu.:837
## Max. :455 Max. :0.568 Max. :837
##
## Length of voyage Gender Age Child
## Min. :1 Min. :0.0000 Min. : 0.00 Min. :0.00000
## 1st Qu.:1 1st Qu.:0.0000 1st Qu.:15.00 1st Qu.:0.00000
## Median :1 Median :1.0000 Median :27.00 Median :0.00000
## Mean :1 Mean :0.5496 Mean :28.37 Mean :0.04134
## 3rd Qu.:1 3rd Qu.:1.0000 3rd Qu.:41.00 3rd Qu.:0.00000
## Max. :1 Max. :1.0000 Max. :92.00 Max. :1.00000
## NA's :61 NA's :317 NA's :3
## Crew Passenger Class Nationality of Passenger Companionship
## Min. :0.00000 Min. : NA Min. : NA Min. : NA
## 1st Qu.:0.00000 1st Qu.: NA 1st Qu.: NA 1st Qu.: NA
## Median :0.00000 Median : NA Median : NA Median : NA
## Mean :0.04134 Mean :NaN Mean :NaN Mean :NaN
## 3rd Qu.:0.00000 3rd Qu.: NA 3rd Qu.: NA 3rd Qu.: NA
## Max. :1.00000 Max. : NA Max. : NA Max. : NA
## NA's :3 NA's :898 NA's :898 NA's :898
## Survival
## Min. :0.0000
## 1st Qu.:0.0000
## Median :0.0000
## Mean :0.1978
## 3rd Qu.:0.0000
## Max. :1.0000
## NA's :3
You should notice some odd things.
The variables that are all set to NA are the Passenger class, Nationality of Passenger, and Companionship. Some of the Gender variables and Age variables are also NA.
We need to pay attention to the fact that there are some variables not available for all of the ships. When a variable is not available, all values will be missing.
The variables that are the same are the variables that characterize the ship. This includes Ship id, the year of the ship, nationality of the ship, women and children first, cause, the number of passengers, the number of women passengers, ship size, and length of voyage.
Variables that have the same value for every observation means that the variable represents the ship’s characteristics.
For the SS Princess Alice the passenger class is not available. I do notice that the crew is not as enoromous as the RMS Titanic.
Now let’s get some descrptive information about the people on the ship using the crosstab function. The variables we are interested in in the RMS_Titanic data set are: Gender, Passenger Class, Crew, Survival.
#Here is the first. You add the others.
table(SS_Princess_Alice$Gender)
##
## 0 1
## 377 460
table(SS_Princess_Alice$`Passenger Class`)
## < table of extent 0 >
table(SS_Princess_Alice$Crew)
##
## 0 1
## 858 37
table(SS_Princess_Alice$Survival)
##
## 0 1
## 718 177
We would also like to know about the proportion who are 15 and under. The problem is that the Child variable is all missing and age is given in years. Let’s update the Child variable with information from the Age variable and then run the crosstab. (For your other ship you need to see if this is necessary and possible. You may need to do other similar things with other variables.)
SS_Princess_Alice$Child <- as.numeric(SS_Princess_Alice$Age) <= 15
# Add the crosstab below
table(SS_Princess_Alice$Child)
##
## FALSE TRUE
## 430 151
SS_Princess_Alice$Child <- as.numeric(SS_Princess_Alice$Age) <= 15 does.This code separates the age of for children under 15 and adults.
According to the data, it is apparent that women made up about 57% of the recorded passengers. The amount of children that were on the ship was at an average of 4%. Futhermore, the ages of the passengers varied 0 to 92. However, there were 317 people’s ages that were not available. The crew did not make up a big proportion of the enitre passenger count. The crew only counted for 4%. The companionship was not available. Therefore, it can be difficult to tell if there were many families that boarded the ship together. Women and children were saved first however, survival was only around 20%.