Now that we have our data sets set up let’s take a look at them and start to analyze what happened. (And learn some R basics along the way.)
Let’s focus on the Titanic.
Start by loading the dataset that you previously saved.
load(paste("~","data/RMS_Titanic.Rda",sep="/"))
In the data, the gender is classified by using the number 1 and 0. The survival varied by age, class and gender.
The data we downloaded contains 1 row per person on the ship for a given ship, in this case RMS Titanic.
Find some basic information about the RMS_Titanic using the str() function and the summary() function on the RMS_Titanic data set. That means, put the name of the data set inside the parentheses.
str(RMS_Titanic)
## Classes 'tbl_df', 'tbl' and 'data.frame': 2208 obs. of 20 variables:
## $ Id_8 : num 1 2 3 4 5 6 7 8 9 10 ...
## $ Ship Id : num 8 8 8 8 8 8 8 8 8 8 ...
## $ Year : num 1912 1912 1912 1912 1912 ...
## $ Nationality of the Ship : chr "U.K" "U.K" "U.K" "U.K" ...
## $ Women and children first : num 1 1 1 1 1 1 1 1 1 1 ...
## $ Quick : num 0 0 0 0 0 0 0 0 0 0 ...
## $ Cause : chr "Collision" "Collision" "Collision" "Collision" ...
## $ No. of passengers : num 1317 1317 1317 1317 1317 ...
## $ No. of women passengers : num 463 463 463 463 463 463 463 463 463 463 ...
## $ Women passengers/passengers: num 0.352 0.352 0.352 0.352 0.352 0.352 0.352 0.352 0.352 0.352 ...
## $ Ship size : num 2208 2208 2208 2208 2208 ...
## $ Length of voyage : num 5 5 5 5 5 5 5 5 5 5 ...
## $ Gender : num 0 0 0 0 0 0 0 0 0 0 ...
## $ Age : num 42 16 14 21 30 33 30 26 26 20 ...
## $ Child : num NA NA NA NA NA NA NA NA NA NA ...
## $ Crew : num 0 0 0 1 0 1 0 0 1 1 ...
## $ Passenger Class : num 3 3 3 NA 2 NA 3 3 NA NA ...
## $ Nationality of Passenger : num NA NA NA NA NA NA NA NA NA NA ...
## $ Companionship : num NA NA NA NA NA NA NA NA NA NA ...
## $ Survival : num 0 0 0 0 0 0 0 0 0 0 ...
summary(RMS_Titanic)
## Id_8 Ship Id Year Nationality of the Ship
## Min. : 1.0 Min. :8 Min. :1912 Length:2208
## 1st Qu.: 552.8 1st Qu.:8 1st Qu.:1912 Class :character
## Median :1104.5 Median :8 Median :1912 Mode :character
## Mean :1104.5 Mean :8 Mean :1912
## 3rd Qu.:1656.2 3rd Qu.:8 3rd Qu.:1912
## Max. :2208.0 Max. :8 Max. :1912
##
## Women and children first Quick Cause No. of passengers
## Min. :1 Min. :0 Length:2208 Min. :1317
## 1st Qu.:1 1st Qu.:0 Class :character 1st Qu.:1317
## Median :1 Median :0 Mode :character Median :1317
## Mean :1 Mean :0 Mean :1317
## 3rd Qu.:1 3rd Qu.:0 3rd Qu.:1317
## Max. :1 Max. :0 Max. :1317
##
## No. of women passengers Women passengers/passengers Ship size
## Min. :463 Min. :0.352 Min. :2208
## 1st Qu.:463 1st Qu.:0.352 1st Qu.:2208
## Median :463 Median :0.352 Median :2208
## Mean :463 Mean :0.352 Mean :2208
## 3rd Qu.:463 3rd Qu.:0.352 3rd Qu.:2208
## Max. :463 Max. :0.352 Max. :2208
##
## Length of voyage Gender Age Child
## Min. :5 Min. :0.0000 Min. : 0.00 Min. : NA
## 1st Qu.:5 1st Qu.:0.0000 1st Qu.:22.00 1st Qu.: NA
## Median :5 Median :0.0000 Median :29.00 Median : NA
## Mean :5 Mean :0.2201 Mean :29.91 Mean :NaN
## 3rd Qu.:5 3rd Qu.:0.0000 3rd Qu.:36.00 3rd Qu.: NA
## Max. :5 Max. :1.0000 Max. :74.00 Max. : NA
## NA's :10 NA's :2208
## Crew Passenger Class Nationality of Passenger Companionship
## Min. :0.0000 Min. :1.000 Min. : NA Min. : NA
## 1st Qu.:0.0000 1st Qu.:2.000 1st Qu.: NA 1st Qu.: NA
## Median :0.0000 Median :3.000 Median : NA Median : NA
## Mean :0.4035 Mean :2.292 Mean :NaN Mean :NaN
## 3rd Qu.:1.0000 3rd Qu.:3.000 3rd Qu.: NA 3rd Qu.: NA
## Max. :1.0000 Max. :3.000 Max. : NA Max. : NA
## NA's :891 NA's :2208 NA's :2208
## Survival
## Min. :0.0000
## 1st Qu.:0.0000
## Median :0.0000
## Mean :0.3225
## 3rd Qu.:1.0000
## Max. :1.0000
##
You should notice some odd things.
The variables that are set to NA are Child, Nationality of Passenger, and Companionship.
We need to pay attention to the fact that there are some variables not available for all of the ships. When a variable is not available, all values will be missing.
The variables No. of passenger, No. of womwn passengers, Ship Id, Year, Women and children first, and Ship size are all the same.
The variables that have the same value for every observation are mostly the basic characteristics of the ship and general passenger information. All passengers were on one ship therefore, the ship variables will remain the same.
The crew is not specifically identified in the passenger class. If there is a value of 1 under the crew variable, then the passenger class contains the value of 0. This indicates that the crew was not classified under a passenger class.
Now let’s get some descrptive information about the people on the ship using the crosstab function. The variables we are interested in in the RMS_Titanic data set are: Gender, Passenger Class, Crew, Survival.
#Here is the first. You add the others.
table(RMS_Titanic$Gender)
##
## 0 1
## 1722 486
table(RMS_Titanic$`Passenger Class`)
##
## 1 2 3
## 324 285 708
table(RMS_Titanic$Crew)
##
## 0 1
## 1317 891
table(RMS_Titanic$Survival)
##
## 0 1
## 1496 712
We would also like to know about the proportion who are 15 and under. The problem is that the Child variable is all missing and age is given in years. Let’s update the Child variable with information from the Age variable and then run the crosstab. (For your other ship you need to see if this is necessary and possible. You may need to do other similar things with other variables.)
RMS_Titanic$Child <- as.numeric(RMS_Titanic$Age) <= 15
# Add the crosstab below
table(RMS_Titanic$Child)
##
## FALSE TRUE
## 2063 135
RMS_Titanic$Child <- as.numeric(RMS_Titanic$Age) <= 15 does.It updates the child variable with information from the Age variable and then it separates the ages that can be identified as 15 years and younger.
Write a paragraph describing the people who were on board the Titanic and what happened to them. The people on board of the Titanic were all from different social classes. There were many people that were a part of the crew that their passenger class was not identified. There were many women and child in various age groups. Many people that were a part of the upper class, especially women and children survived over the men that were on board.