Now that we have our data sets set up let’s take a look at them and start to analyze what happened. (And learn some R basics along the way.)

Let’s focus on the Titanic.

Start by loading the dataset that you previously saved.

  load(paste("~","data/SS Vestris.Rda", sep="/"))

Describing the Data

The data we downloaded contains 1 row per person on the ship for a given ship, in this case RMS Titanic.

Find some basic information about the SS Vestris using the str(SS_Vestris) function and the summary(SS_Vestris) function on the SS Vestris data set. That means, put the name of the data set inside the parentheses.

str(SS_Vestris)
## Classes 'tbl_df', 'tbl' and 'data.frame':    328 obs. of  20 variables:
##  $ Id_12                      : num  1 2 3 4 5 6 7 8 9 10 ...
##  $ Ship Id                    : num  12 12 12 12 12 12 12 12 12 12 ...
##  $ Year                       : num  1928 1928 1928 1928 1928 ...
##  $ Nationality of the Ship    : chr  "U.K" "U.K" "U.K" "U.K" ...
##  $ Women and children first   : num  0 0 0 0 0 0 0 0 0 0 ...
##  $ Quick                      : num  0 0 0 0 0 0 0 0 0 0 ...
##  $ Cause                      : chr  "Weather" "Weather" "Weather" "Weather" ...
##  $ No. of passengers          : num  113 113 113 113 113 113 113 113 113 113 ...
##  $ No. of women passengers    : num  38 38 38 38 38 38 38 38 38 38 ...
##  $ Women passengers/passengers: num  0.336 0.336 0.336 0.336 0.336 0.336 0.336 0.336 0.336 0.336 ...
##  $ Ship size                  : num  308 308 308 308 308 308 308 308 308 308 ...
##  $ Length of voyage           : num  2 2 2 2 2 2 2 2 2 2 ...
##  $ Gender                     : num  NA NA 0 0 1 0 0 0 1 0 ...
##  $ Age                        : num  NA NA NA NA NA NA NA NA NA NA ...
##  $ Child                      : num  0 0 0 0 0 0 0 0 0 0 ...
##  $ Crew                       : num  0 0 0 0 0 0 0 0 0 0 ...
##  $ Passenger Class            : num  NA NA NA NA NA NA NA NA NA NA ...
##  $ Nationality of Passenger   : num  NA NA NA NA NA NA NA NA NA NA ...
##  $ Companionship              : num  NA NA NA NA NA NA NA NA NA NA ...
##  $ Survival                   : num  1 1 1 1 1 1 1 1 1 1 ...
summary(SS_Vestris)
##      Id_12           Ship Id        Year      Nationality of the Ship
##  Min.   :  1.00   Min.   :12   Min.   :1928   Length:328             
##  1st Qu.: 82.75   1st Qu.:12   1st Qu.:1928   Class :character       
##  Median :164.50   Median :12   Median :1928   Mode  :character       
##  Mean   :164.50   Mean   :12   Mean   :1928                          
##  3rd Qu.:246.25   3rd Qu.:12   3rd Qu.:1928                          
##  Max.   :328.00   Max.   :12   Max.   :1928                          
##                                                                      
##  Women and children first     Quick      Cause           No. of passengers
##  Min.   :0                Min.   :0   Length:328         Min.   :113      
##  1st Qu.:0                1st Qu.:0   Class :character   1st Qu.:113      
##  Median :0                Median :0   Mode  :character   Median :113      
##  Mean   :0                Mean   :0                      Mean   :113      
##  3rd Qu.:0                3rd Qu.:0                      3rd Qu.:113      
##  Max.   :0                Max.   :0                      Max.   :113      
##                                                                           
##  No. of women passengers Women passengers/passengers   Ship size  
##  Min.   :38              Min.   :0.336               Min.   :308  
##  1st Qu.:38              1st Qu.:0.336               1st Qu.:308  
##  Median :38              Median :0.336               Median :308  
##  Mean   :38              Mean   :0.336               Mean   :308  
##  3rd Qu.:38              3rd Qu.:0.336               3rd Qu.:308  
##  Max.   :38              Max.   :0.336               Max.   :308  
##                                                                   
##  Length of voyage     Gender            Age          Child        
##  Min.   :2        Min.   :0.0000   Min.   : NA   Min.   :0.00000  
##  1st Qu.:2        1st Qu.:0.0000   1st Qu.: NA   1st Qu.:0.00000  
##  Median :2        Median :0.0000   Median : NA   Median :0.00000  
##  Mean   :2        Mean   :0.1331   Mean   :NaN   Mean   :0.03659  
##  3rd Qu.:2        3rd Qu.:0.0000   3rd Qu.: NA   3rd Qu.:0.00000  
##  Max.   :2        Max.   :1.0000   Max.   : NA   Max.   :1.00000  
##                   NA's   :20       NA's   :328                    
##       Crew        Passenger Class Nationality of Passenger Companionship
##  Min.   :0.0000   Min.   : NA     Min.   : NA              Min.   : NA  
##  1st Qu.:0.0000   1st Qu.: NA     1st Qu.: NA              1st Qu.: NA  
##  Median :1.0000   Median : NA     Median : NA              Median : NA  
##  Mean   :0.5976   Mean   :NaN     Mean   :NaN              Mean   :NaN  
##  3rd Qu.:1.0000   3rd Qu.: NA     3rd Qu.: NA              3rd Qu.: NA  
##  Max.   :1.0000   Max.   : NA     Max.   : NA              Max.   : NA  
##                   NA's   :328     NA's   :328              NA's   :328  
##     Survival     
##  Min.   :0.0000  
##  1st Qu.:0.0000  
##  Median :1.0000  
##  Mean   :0.6128  
##  3rd Qu.:1.0000  
##  Max.   :1.0000  
## 

You should notice some odd things.

Some variables are all set to NA. Which ones are these?

We need to pay attention to the fact that there are some variables not available for all of the ships. When a variable is not available, all values will be missing.

There are some variables where all the values are the same. Which are these?

Why do some variables have the same value for every observation? Think about what they refer to.

Browse the dataset. There is something interesting about how the Crew and Passenger Class variables relate to each other. What is that?

Tables

Now let’s get some descrptive information about the people on the ship using the crosstab function. The variables we are interested in in the SS_Vestris data set are: Nationality,Passenger, Survival.

#Here is the first. You add the others.
crosstab(SS_Vestris, row.vars = "Gender")
##       
## Gender  Count Total %
##    0   267.00   86.69
##    1    41.00   13.31
##    Sum 308.00  100.00
crosstab(SS_Vestris, row.vars = "Crew")
##      
## Crew   Count Total %
##   0   132.00   40.24
##   1   196.00   59.76
##   Sum 328.00  100.00
crosstab(SS_Vestris, row.vars = "Survival")
##         
## Survival  Count Total %
##      0   127.00   38.72
##      1   201.00   61.28
##      Sum 328.00  100.00

We would also like to know about the proportion who are 15 and under. The problem is that the Child variable is all missing and age is given in years. Let’s update the Child variable with information from the Age variable and then run the crosstab. NO AGE AND CHILD AVAILABLE (For your other ship you need to see if this is necessary and possible. You may need to do other similar things with other variables.)

Age Variable is not avaible

# Add the crosstab below

Explain what you think the code SS_Vestris$Child <- as.numeric(SS_Vestris$Age) <= 15 does. Age is N/A

Summary

Write a paragraph describing the people who were on board the SS Vestris and what happened to them. out of 113 passenger of the total between passengers (man, women and crew member), women survival was higher