Now that we have our data sets set up let’s take a look at them and start to analyze what happened. (And learn some R basics along the way.)

Let’s focus on the Titanic.

Start by loading the dataset that you previously saved.

  load(paste("~","data/MV_Princess_Victoria.Rda", sep="/"))
# Do the same thing but for  your ship.

Describing the Data

The data we downloaded contains 1 row per person on the ship for a given ship, in this case RMS Titanic and your ship.

Find some basic information about the data for your ship using the str() function and the summary() function on the RMS_Titanic data set. That means, put the name of the data set inside the parentheses.

str(MV_Princess_Victoria)
## Classes 'tbl_df', 'tbl' and 'data.frame':    179 obs. of  20 variables:
##  $ Id_14                      : num  1 2 3 4 5 6 7 8 9 10 ...
##  $ Ship Id                    : num  14 14 14 14 14 14 14 14 14 14 ...
##  $ Year                       : num  1953 1953 1953 1953 1953 ...
##  $ Nationality of the Ship    : chr  "U.K" "U.K" "U.K" "U.K" ...
##  $ Women and children first   : num  0 0 0 0 0 0 0 0 0 0 ...
##  $ Quick                      : num  0 0 0 0 0 0 0 0 0 0 ...
##  $ Cause                      : chr  "Weather" "Weather" "Weather" "Weather" ...
##  $ No. of passengers          : num  129 129 129 129 129 129 129 129 129 129 ...
##  $ No. of women passengers    : num  26 26 26 26 26 26 26 26 26 26 ...
##  $ Women passengers/passengers: num  0.202 0.202 0.202 0.202 0.202 0.202 0.202 0.202 0.202 0.202 ...
##  $ Ship size                  : num  179 179 179 179 179 179 179 179 179 179 ...
##  $ Length of voyage           : num  1 1 1 1 1 1 1 1 1 1 ...
##  $ Gender                     : num  0 1 0 0 0 0 0 1 1 1 ...
##  $ Age                        : num  25 39 42 33 NA NA 53 NA NA NA ...
##  $ Child                      : num  0 0 0 0 0 0 0 0 0 0 ...
##  $ Crew                       : num  1 1 1 1 1 1 0 1 1 1 ...
##  $ Passenger Class            : logi  NA NA NA NA NA NA ...
##  $ Nationality of Passenger   : chr  "Northern Ireland" "Northern Ireland" "Northern Ireland" "Northern Ireland" ...
##  $ Companionship              : logi  NA NA NA NA NA NA ...
##  $ Survival                   : num  0 0 1 0 0 0 0 0 0 0 ...
summary(MV_Princess_Victoria)
##      Id_14          Ship Id        Year      Nationality of the Ship
##  Min.   :  1.0   Min.   :14   Min.   :1953   Length:179             
##  1st Qu.: 45.5   1st Qu.:14   1st Qu.:1953   Class :character       
##  Median : 90.0   Median :14   Median :1953   Mode  :character       
##  Mean   : 90.0   Mean   :14   Mean   :1953                          
##  3rd Qu.:134.5   3rd Qu.:14   3rd Qu.:1953                          
##  Max.   :179.0   Max.   :14   Max.   :1953                          
##                                                                     
##  Women and children first     Quick      Cause           No. of passengers
##  Min.   :0                Min.   :0   Length:179         Min.   :129      
##  1st Qu.:0                1st Qu.:0   Class :character   1st Qu.:129      
##  Median :0                Median :0   Mode  :character   Median :129      
##  Mean   :0                Mean   :0                      Mean   :129      
##  3rd Qu.:0                3rd Qu.:0                      3rd Qu.:129      
##  Max.   :0                Max.   :0                      Max.   :129      
##                                                                           
##  No. of women passengers Women passengers/passengers   Ship size  
##  Min.   :26              Min.   :0.202               Min.   :179  
##  1st Qu.:26              1st Qu.:0.202               1st Qu.:179  
##  Median :26              Median :0.202               Median :179  
##  Mean   :26              Mean   :0.202               Mean   :179  
##  3rd Qu.:26              3rd Qu.:0.202               3rd Qu.:179  
##  Max.   :26              Max.   :0.202               Max.   :179  
##                                                                   
##  Length of voyage     Gender            Age            Child        
##  Min.   :1        Min.   :0.0000   Min.   : 2.00   Min.   :0.00000  
##  1st Qu.:1        1st Qu.:0.0000   1st Qu.:22.00   1st Qu.:0.00000  
##  Median :1        Median :0.0000   Median :28.00   Median :0.00000  
##  Mean   :1        Mean   :0.1732   Mean   :31.27   Mean   :0.02809  
##  3rd Qu.:1        3rd Qu.:0.0000   3rd Qu.:37.00   3rd Qu.:0.00000  
##  Max.   :1        Max.   :1.0000   Max.   :70.00   Max.   :1.00000  
##                                    NA's   :86      NA's   :1        
##       Crew        Passenger Class Nationality of Passenger Companionship 
##  Min.   :0.0000   Mode:logical    Length:179               Mode:logical  
##  1st Qu.:0.0000   NA's:179        Class :character         NA's:179      
##  Median :0.0000                   Mode  :character                       
##  Mean   :0.2793                                                          
##  3rd Qu.:1.0000                                                          
##  Max.   :1.0000                                                          
##                                                                          
##     Survival     
##  Min.   :0.0000  
##  1st Qu.:0.0000  
##  Median :0.0000  
##  Mean   :0.2458  
##  3rd Qu.:0.0000  
##  Max.   :1.0000  
## 

You will probably notice some odd things.

Some variables are all set to NA. Which ones are these?

We need to pay attention to the fact that there are some variables not available for all of the ships. When a variable is not available, all values will be missing.

There are some variables where all the values are the same. Which are these?

Why do some variables have the same value for every observation? Think about what they refer to. (Think back to the ships data)

Browse the dataset. There is something interesting about how the Crew and Passenger Class variables relate to each other. What is that?

Tables

Now let’s get some descrptive information about the people on the ship using the frequency() function. The variables we are interested in in the RMS_Titanic data set are: Gender, Crew, Survival and any other variables that are nominal or ordinal and have data.

Remember the code will look like this: frequency(MV_Princess_Victoria$Crew)

frequency(MV_Princess_Victoria$Crew)
##  Values Freq Percent
##  0      129  72.1   
##  1      50   27.9   
##  Total  179  100

Are there any interval variables? If so run a summary() function for them.

summary(MV_Princess_Victoria$'Age')
##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max.    NA's 
##    2.00   22.00   28.00   31.27   37.00   70.00      86

Now run at least three cross tabs. The dependent variable (row.vars) should be Survival for all of them. The two shared variables are Crew and Gender. Pick one other variable describing individual people.

Remember that the code will look like: crosstab(DATASET, row.vars=“”, col.vars=“”, title=“”, format=“col_percent”)

crosstab(MV_Princess_Victoria, row.vars="Survival", col.vars="Crew", title="Survival by Crew", format="col_percent")
## Survival by Crew
##         0    1 
## 0       73.6 80
## 1       26.4 20
## Total N 129  50
crosstab(MV_Princess_Victoria, row.vars="Survival", col.vars="Gender", title="Survival by Gender", format="col_percent")
## Survival by Gender
##         0    1  
## 0       70.3 100
## 1       29.7 0  
## Total N 148  31
crosstab(MV_Princess_Victoria, row.vars="Crew", col.vars="Gender", title="Crew by Gender", format="col_percent")
## Crew by Gender
##         0    1   
## 0       69.6 83.9
## 1       30.4 16.1
## Total N 148  31

Summary

Write a paragraph describing the people who were on board your ship and what happened to them.

According to our dataset, passengers on the MV Princess Victoria 73.6. percent of the passengers died, and 80 percent of crews died. However, 26.4 percent of passenger survived, and 20 percent of crew members survived. More crew members died than passengers because the crew would assist passengers to put on lifejackets. They also worked to prepare the lifeboats for launching, although the chances of successfully launching lifeboats with the wind gale were very dim. They also had to try to launch them while the vessel was listing badly due to flooding. Considering survival and gender, 70.3 of males died and 100 percent of females died. More males survived than females because more passengers and crew members were males, but also because the captain did not order women and children first on the lifeboats that were able to launch.