Now that we have our data sets set up let’s take a look at them and start to analyze what happened. (And learn some R basics along the way.)

Let’s focus on the SS Princess Alice.

Start by loading the dataset that you previously saved.

  load("data/SS_Princess_Alice.Rda")

Describing the Data

The data that was record on the characteristics are the same. The category entitled gender has some data that is not available. The data based on Passenger class, Nationality of Passenger, and Companionship are not available.

Find some basic information about the RMS_Titanic using the str() function and the summary() function on the SS_Princess_Alice data set. That means, put the name of the data set inside the parentheses.

str(SS_Princess_Alice)
## Classes 'tbl_df', 'tbl' and 'data.frame':    898 obs. of  20 variables:
##  $ Id_6                       : num  1 2 3 4 5 6 7 8 9 10 ...
##  $ Ship Id                    : num  6 6 6 6 6 6 6 6 6 6 ...
##  $ Year                       : num  1878 1878 1878 1878 1878 ...
##  $ Nationality of the Ship    : chr  "U.K" "U.K" "U.K" "U.K" ...
##  $ Women and children first   : num  0 0 0 0 0 0 0 0 0 0 ...
##  $ Quick                      : num  1 1 1 1 1 1 1 1 1 1 ...
##  $ Cause                      : chr  "Collision" "Collision" "Collision" "Collision" ...
##  $ No. of passengers          : num  801 801 801 801 801 801 801 801 801 801 ...
##  $ No. of women passengers    : num  455 455 455 455 455 455 455 455 455 455 ...
##  $ Women passengers/passengers: num  0.568 0.568 0.568 0.568 0.568 0.568 0.568 0.568 0.568 0.568 ...
##  $ Ship size                  : num  837 837 837 837 837 837 837 837 837 837 ...
##  $ Length of voyage           : num  1 1 1 1 1 1 1 1 1 1 ...
##  $ Gender                     : num  0 1 NA NA 1 1 1 0 0 1 ...
##  $ Age                        : num  29 33 NA NA 65 16 33 35 33 0 ...
##  $ Child                      : num  0 0 0 0 0 0 0 0 0 0 ...
##  $ Crew                       : num  0 0 0 0 0 0 0 0 0 0 ...
##  $ Passenger Class            : num  NA NA NA NA NA NA NA NA NA NA ...
##  $ Nationality of Passenger   : num  NA NA NA NA NA NA NA NA NA NA ...
##  $ Companionship              : num  NA NA NA NA NA NA NA NA NA NA ...
##  $ Survival                   : num  0 0 1 1 0 0 0 0 0 0 ...
summary(SS_Princess_Alice)
##       Id_6          Ship Id       Year      Nationality of the Ship
##  Min.   :  1.0   Min.   :6   Min.   :1878   Length:898             
##  1st Qu.:224.5   1st Qu.:6   1st Qu.:1878   Class :character       
##  Median :448.0   Median :6   Median :1878   Mode  :character       
##  Mean   :448.0   Mean   :6   Mean   :1878                          
##  3rd Qu.:671.5   3rd Qu.:6   3rd Qu.:1878                          
##  Max.   :895.0   Max.   :6   Max.   :1878                          
##  NA's   :3                                                         
##  Women and children first     Quick      Cause           No. of passengers
##  Min.   :0                Min.   :1   Length:898         Min.   :801      
##  1st Qu.:0                1st Qu.:1   Class :character   1st Qu.:801      
##  Median :0                Median :1   Mode  :character   Median :801      
##  Mean   :0                Mean   :1                      Mean   :801      
##  3rd Qu.:0                3rd Qu.:1                      3rd Qu.:801      
##  Max.   :0                Max.   :1                      Max.   :801      
##                                                                           
##  No. of women passengers Women passengers/passengers   Ship size  
##  Min.   :455             Min.   :0.568               Min.   :837  
##  1st Qu.:455             1st Qu.:0.568               1st Qu.:837  
##  Median :455             Median :0.568               Median :837  
##  Mean   :455             Mean   :0.568               Mean   :837  
##  3rd Qu.:455             3rd Qu.:0.568               3rd Qu.:837  
##  Max.   :455             Max.   :0.568               Max.   :837  
##                                                                   
##  Length of voyage     Gender            Age            Child        
##  Min.   :1        Min.   :0.0000   Min.   : 0.00   Min.   :0.00000  
##  1st Qu.:1        1st Qu.:0.0000   1st Qu.:15.00   1st Qu.:0.00000  
##  Median :1        Median :1.0000   Median :27.00   Median :0.00000  
##  Mean   :1        Mean   :0.5496   Mean   :28.37   Mean   :0.04134  
##  3rd Qu.:1        3rd Qu.:1.0000   3rd Qu.:41.00   3rd Qu.:0.00000  
##  Max.   :1        Max.   :1.0000   Max.   :92.00   Max.   :1.00000  
##                   NA's   :61       NA's   :317     NA's   :3        
##       Crew         Passenger Class Nationality of Passenger Companionship
##  Min.   :0.00000   Min.   : NA     Min.   : NA              Min.   : NA  
##  1st Qu.:0.00000   1st Qu.: NA     1st Qu.: NA              1st Qu.: NA  
##  Median :0.00000   Median : NA     Median : NA              Median : NA  
##  Mean   :0.04134   Mean   :NaN     Mean   :NaN              Mean   :NaN  
##  3rd Qu.:0.00000   3rd Qu.: NA     3rd Qu.: NA              3rd Qu.: NA  
##  Max.   :1.00000   Max.   : NA     Max.   : NA              Max.   : NA  
##  NA's   :3         NA's   :898     NA's   :898              NA's   :898  
##     Survival     
##  Min.   :0.0000  
##  1st Qu.:0.0000  
##  Median :0.0000  
##  Mean   :0.1978  
##  3rd Qu.:0.0000  
##  Max.   :1.0000  
##  NA's   :3

You should notice some odd things.

Some variables are all set to NA. Which ones are these?

The variables that are all set to NA are the Passenger class, Nationality of Passenger, and Companionship. Some of the Gender variables and Age variables are also NA.

We need to pay attention to the fact that there are some variables not available for all of the ships. When a variable is not available, all values will be missing.

There are some variables where all the values are the same. Which are these?

The variables that are the same are the variables that characterize the ship. This includes Ship id, the year of the ship, nationality of the ship, women and children first, cause, the number of passengers, the number of women passengers, ship size, and length of voyage.

Why do some variables have the same value for every observation? Think about what they refer to.

Variables that have the same value for every observation means that the variable represents the ship’s characteristics.

Browse the dataset. There is something interesting about how the Crew and Passenger Class variables relate to each other. What is that?

For the SS Princess Alice the passenger class is not available. I do notice that the crew is not as enoromous as the RMS Titanic.

Tables

Now let’s get some descrptive information about the people on the ship using the crosstab function. The variables we are interested in in the RMS_Titanic data set are: Gender, Passenger Class, Crew, Survival.

#Here is the first. You add the others.
table(SS_Princess_Alice$Gender)
## 
##   0   1 
## 377 460
table(SS_Princess_Alice$`Passenger Class`)
## < table of extent 0 >
table(SS_Princess_Alice$Crew)
## 
##   0   1 
## 858  37
table(SS_Princess_Alice$Survival)
## 
##   0   1 
## 718 177

We would also like to know about the proportion who are 15 and under. The problem is that the Child variable is all missing and age is given in years. Let’s update the Child variable with information from the Age variable and then run the crosstab. (For your other ship you need to see if this is necessary and possible. You may need to do other similar things with other variables.)

SS_Princess_Alice$Child <- as.numeric(SS_Princess_Alice$Age) <= 15

# Add the crosstab below
table(SS_Princess_Alice$Child)
## 
## FALSE  TRUE 
##   430   151

Explain what you think the code SS_Princess_Alice$Child <- as.numeric(SS_Princess_Alice$Age) <= 15 does.

This code separates the age of for children under 15 and adults.

Summary

According to the data, it is apparent that women made up about 57% of the recorded passengers. The amount of children that were on the ship was at an average of 4%. Futhermore, the ages of the passengers varied 0 to 92. However, there were 317 people’s ages that were not available. The crew did not make up a big proportion of the enitre passenger count. The crew only counted for 4%. The companionship was not available. Therefore, it can be difficult to tell if there were many families that boarded the ship together. Women and children were saved first however, survival was only around 20%.