Now that we have our data sets set up let’s take a look at them and start to analyze what happened. (And learn some R basics along the way.)

Let’s focus on the Titanic.

Start by loading the dataset that you previously saved.

  load(paste("~","data/RMS_Titanic.Rda", sep="/"))
# Do the same thing but for  your ship.

Describing the Data

The data we downloaded contains 1 row per person on the ship for a given ship, in this case RMS Titanic and your ship.

Find some basic information about the data for your ship using the str() function and the summary() function on the RMS_Titanic data set. That means, put the name of the data set inside the parentheses.

str(RMS_Titanic)
## Classes 'tbl_df', 'tbl' and 'data.frame':    2208 obs. of  20 variables:
##  $ Id_8                       : num  1 2 3 4 5 6 7 8 9 10 ...
##  $ Ship Id                    : num  8 8 8 8 8 8 8 8 8 8 ...
##  $ Year                       : num  1912 1912 1912 1912 1912 ...
##  $ Nationality of the Ship    : chr  "U.K" "U.K" "U.K" "U.K" ...
##  $ Women and children first   : num  1 1 1 1 1 1 1 1 1 1 ...
##  $ Quick                      : num  0 0 0 0 0 0 0 0 0 0 ...
##  $ Cause                      : chr  "Collision" "Collision" "Collision" "Collision" ...
##  $ No. of passengers          : num  1317 1317 1317 1317 1317 ...
##  $ No. of women passengers    : num  463 463 463 463 463 463 463 463 463 463 ...
##  $ Women passengers/passengers: num  0.352 0.352 0.352 0.352 0.352 0.352 0.352 0.352 0.352 0.352 ...
##  $ Ship size                  : num  2208 2208 2208 2208 2208 ...
##  $ Length of voyage           : num  5 5 5 5 5 5 5 5 5 5 ...
##  $ Gender                     : num  0 0 0 0 0 0 0 0 0 0 ...
##  $ Age                        : num  42 16 14 21 30 33 30 26 26 20 ...
##  $ Child                      : logi  NA NA NA NA NA NA ...
##  $ Crew                       : num  0 0 0 1 0 1 0 0 1 1 ...
##  $ Passenger Class            : num  3 3 3 NA 2 NA 3 3 NA NA ...
##  $ Nationality of Passenger   : logi  NA NA NA NA NA NA ...
##  $ Companionship              : logi  NA NA NA NA NA NA ...
##  $ Survival                   : num  0 0 0 0 0 0 0 0 0 0 ...
summary(RMS_Titanic)
##       Id_8           Ship Id       Year      Nationality of the Ship
##  Min.   :   1.0   Min.   :8   Min.   :1912   Length:2208            
##  1st Qu.: 552.8   1st Qu.:8   1st Qu.:1912   Class :character       
##  Median :1104.5   Median :8   Median :1912   Mode  :character       
##  Mean   :1104.5   Mean   :8   Mean   :1912                          
##  3rd Qu.:1656.2   3rd Qu.:8   3rd Qu.:1912                          
##  Max.   :2208.0   Max.   :8   Max.   :1912                          
##                                                                     
##  Women and children first     Quick      Cause           No. of passengers
##  Min.   :1                Min.   :0   Length:2208        Min.   :1317     
##  1st Qu.:1                1st Qu.:0   Class :character   1st Qu.:1317     
##  Median :1                Median :0   Mode  :character   Median :1317     
##  Mean   :1                Mean   :0                      Mean   :1317     
##  3rd Qu.:1                3rd Qu.:0                      3rd Qu.:1317     
##  Max.   :1                Max.   :0                      Max.   :1317     
##                                                                           
##  No. of women passengers Women passengers/passengers   Ship size   
##  Min.   :463             Min.   :0.352               Min.   :2208  
##  1st Qu.:463             1st Qu.:0.352               1st Qu.:2208  
##  Median :463             Median :0.352               Median :2208  
##  Mean   :463             Mean   :0.352               Mean   :2208  
##  3rd Qu.:463             3rd Qu.:0.352               3rd Qu.:2208  
##  Max.   :463             Max.   :0.352               Max.   :2208  
##                                                                    
##  Length of voyage     Gender            Age         Child        
##  Min.   :5        Min.   :0.0000   Min.   : 0.00   Mode:logical  
##  1st Qu.:5        1st Qu.:0.0000   1st Qu.:22.00   NA's:2208     
##  Median :5        Median :0.0000   Median :29.00                 
##  Mean   :5        Mean   :0.2201   Mean   :29.91                 
##  3rd Qu.:5        3rd Qu.:0.0000   3rd Qu.:36.00                 
##  Max.   :5        Max.   :1.0000   Max.   :74.00                 
##                                    NA's   :10                    
##       Crew        Passenger Class Nationality of Passenger Companionship 
##  Min.   :0.0000   Min.   :1.000   Mode:logical             Mode:logical  
##  1st Qu.:0.0000   1st Qu.:2.000   NA's:2208                NA's:2208     
##  Median :0.0000   Median :3.000                                          
##  Mean   :0.4035   Mean   :2.292                                          
##  3rd Qu.:1.0000   3rd Qu.:3.000                                          
##  Max.   :1.0000   Max.   :3.000                                          
##                   NA's   :891                                            
##     Survival     
##  Min.   :0.0000  
##  1st Qu.:0.0000  
##  Median :0.0000  
##  Mean   :0.3225  
##  3rd Qu.:1.0000  
##  Max.   :1.0000  
## 

You will probably notice some odd things.

Some variables are all set to NA. Which ones are these?

We need to pay attention to the fact that there are some variables not available for all of the ships. When a variable is not available, all values will be missing.

There are some variables where all the values are the same. Which are these?

Why do some variables have the same value for every observation? Think about what they refer to. (Think back to the ships data)

Browse the dataset. There is something interesting about how the Crew and Passenger Class variables relate to each other. What is that?

Tables

Now let’s get some descrptive information about the people on the ship using the frequency() function. The variables we are interested in in the RMS_Titanic data set are: Gender, Crew, Survival and any other variables that are nominal or ordinal and have data.

Remember the code will look like this:

frequency(RMS_Titanic$Crew)
##  Values Freq Percent
##  0      1317 59.6   
##  1      891  40.4   
##  Total  2208 100

Are there any interval variables? If so run a summary() function for them.

summary(RMS_Titanic$'Age')
##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max.    NA's 
##    0.00   22.00   29.00   29.91   36.00   74.00      10

Now run at least three cross tabs. The dependent variable (row.vars) should be Survival for all of them. The two shared variables are Crew and Gender. Pick one other variable describing individual people.

Remember that the code will look like: crosstab(DATASET, row.vars=“”, col.vars=“Survival”,col.vars=“Crew”, title=“Survival by Crew Status”, format=“col_percent”)

crosstab(RMS_Titanic, row.vars="Survival", col.vars="Crew", title="Survival by Crew", format="col_percent")
## Survival by Crew
##         0    1   
## 0       62   76.2
## 1       38   23.8
## Total N 1317 891
crosstab(RMS_Titanic, row.vars="Survival", col.vars="Gender", title="Survival by Gender", format="col_percent")
## Survival by Gender
##         0    1   
## 0       79.3 26.7
## 1       20.7 73.3
## Total N 1722 486
crosstab(RMS_Titanic, row.vars="Crew", col.vars="Gender", title="Crew by Gender", format="col_percent")
## Crew by Gender
##         0    1   
## 0       49.6 95.3
## 1       50.4 4.7 
## Total N 1722 486

Summary

Write a paragraph describing the people who were on board your ship and what happened to them. According to our dataset, passenger on the Titanic 62 percent of the passengers died and 76.2 percent of crews died. However, 38 percent of passenger survived and 23 percent of crew members survived. More crew members died than passengers because the passengers had priority to survival boats.Considering survival and gender,79.3 of males died and 26.7 percent of femlaes died. More males died than females because more crew members were males. In addition woman and children had priority to escape boats.