Now that we have our data sets set up let’s take a look at them and start to analyze what happened. (And learn some R basics along the way.)
Let’s focus on the Titanic.
Start by loading the dataset that you previously saved.
load(paste("~","data/MV_Princess_Victoria.Rda", sep="/"))
# Do the same thing but for your ship.
The data we downloaded contains 1 row per person on the ship for a given ship, in this case RMS Titanic and your ship.
Find some basic information about the data for your ship using the str() function and the summary() function on the RMS_Titanic data set. That means, put the name of the data set inside the parentheses.
str(MV_Princess_Victoria)
## Classes 'tbl_df', 'tbl' and 'data.frame': 179 obs. of 20 variables:
## $ Id_14 : num 1 2 3 4 5 6 7 8 9 10 ...
## $ Ship Id : num 14 14 14 14 14 14 14 14 14 14 ...
## $ Year : num 1953 1953 1953 1953 1953 ...
## $ Nationality of the Ship : chr "U.K" "U.K" "U.K" "U.K" ...
## $ Women and children first : num 0 0 0 0 0 0 0 0 0 0 ...
## $ Quick : num 0 0 0 0 0 0 0 0 0 0 ...
## $ Cause : chr "Weather" "Weather" "Weather" "Weather" ...
## $ No. of passengers : num 129 129 129 129 129 129 129 129 129 129 ...
## $ No. of women passengers : num 26 26 26 26 26 26 26 26 26 26 ...
## $ Women passengers/passengers: num 0.202 0.202 0.202 0.202 0.202 0.202 0.202 0.202 0.202 0.202 ...
## $ Ship size : num 179 179 179 179 179 179 179 179 179 179 ...
## $ Length of voyage : num 1 1 1 1 1 1 1 1 1 1 ...
## $ Gender : num 0 1 0 0 0 0 0 1 1 1 ...
## $ Age : num 25 39 42 33 NA NA 53 NA NA NA ...
## $ Child : num 0 0 0 0 0 0 0 0 0 0 ...
## $ Crew : num 1 1 1 1 1 1 0 1 1 1 ...
## $ Passenger Class : logi NA NA NA NA NA NA ...
## $ Nationality of Passenger : chr "Northern Ireland" "Northern Ireland" "Northern Ireland" "Northern Ireland" ...
## $ Companionship : logi NA NA NA NA NA NA ...
## $ Survival : num 0 0 1 0 0 0 0 0 0 0 ...
summary(MV_Princess_Victoria)
## Id_14 Ship Id Year Nationality of the Ship
## Min. : 1.0 Min. :14 Min. :1953 Length:179
## 1st Qu.: 45.5 1st Qu.:14 1st Qu.:1953 Class :character
## Median : 90.0 Median :14 Median :1953 Mode :character
## Mean : 90.0 Mean :14 Mean :1953
## 3rd Qu.:134.5 3rd Qu.:14 3rd Qu.:1953
## Max. :179.0 Max. :14 Max. :1953
##
## Women and children first Quick Cause No. of passengers
## Min. :0 Min. :0 Length:179 Min. :129
## 1st Qu.:0 1st Qu.:0 Class :character 1st Qu.:129
## Median :0 Median :0 Mode :character Median :129
## Mean :0 Mean :0 Mean :129
## 3rd Qu.:0 3rd Qu.:0 3rd Qu.:129
## Max. :0 Max. :0 Max. :129
##
## No. of women passengers Women passengers/passengers Ship size
## Min. :26 Min. :0.202 Min. :179
## 1st Qu.:26 1st Qu.:0.202 1st Qu.:179
## Median :26 Median :0.202 Median :179
## Mean :26 Mean :0.202 Mean :179
## 3rd Qu.:26 3rd Qu.:0.202 3rd Qu.:179
## Max. :26 Max. :0.202 Max. :179
##
## Length of voyage Gender Age Child
## Min. :1 Min. :0.0000 Min. : 2.00 Min. :0.00000
## 1st Qu.:1 1st Qu.:0.0000 1st Qu.:22.00 1st Qu.:0.00000
## Median :1 Median :0.0000 Median :28.00 Median :0.00000
## Mean :1 Mean :0.1732 Mean :31.27 Mean :0.02809
## 3rd Qu.:1 3rd Qu.:0.0000 3rd Qu.:37.00 3rd Qu.:0.00000
## Max. :1 Max. :1.0000 Max. :70.00 Max. :1.00000
## NA's :86 NA's :1
## Crew Passenger Class Nationality of Passenger Companionship
## Min. :0.0000 Mode:logical Length:179 Mode:logical
## 1st Qu.:0.0000 NA's:179 Class :character NA's:179
## Median :0.0000 Mode :character
## Mean :0.2793
## 3rd Qu.:1.0000
## Max. :1.0000
##
## Survival
## Min. :0.0000
## 1st Qu.:0.0000
## Median :0.0000
## Mean :0.2458
## 3rd Qu.:0.0000
## Max. :1.0000
##
You will probably notice some odd things.
We need to pay attention to the fact that there are some variables not available for all of the ships. When a variable is not available, all values will be missing.
Now let’s get some descrptive information about the people on the ship using the frequency() function. The variables we are interested in in the RMS_Titanic data set are: Gender, Crew, Survival and any other variables that are nominal or ordinal and have data.
Remember the code will look like this: frequency(MV_Princess_Victoria$Crew)
frequency(MV_Princess_Victoria$Crew)
## Values Freq Percent
## 0 129 72.1
## 1 50 27.9
## Total 179 100
Are there any interval variables? If so run a summary() function for them.
summary(MV_Princess_Victoria$'Age')
## Min. 1st Qu. Median Mean 3rd Qu. Max. NA's
## 2.00 22.00 28.00 31.27 37.00 70.00 86
Now run at least three cross tabs. The dependent variable (row.vars) should be Survival for all of them. The two shared variables are Crew and Gender. Pick one other variable describing individual people.
Remember that the code will look like: crosstab(DATASET, row.vars=“”, col.vars=“”, title=“”, format=“col_percent”)
crosstab(MV_Princess_Victoria, row.vars="Survival", col.vars="Crew", title="Survival by Crew", format="col_percent")
## Survival by Crew
## 0 1
## 0 73.6 80
## 1 26.4 20
## Total N 129 50
crosstab(MV_Princess_Victoria, row.vars="Survival", col.vars="Gender", title="Survival by Gender", format="col_percent")
## Survival by Gender
## 0 1
## 0 70.3 100
## 1 29.7 0
## Total N 148 31
crosstab(MV_Princess_Victoria, row.vars="Crew", col.vars="Gender", title="Crew by Gender", format="col_percent")
## Crew by Gender
## 0 1
## 0 69.6 83.9
## 1 30.4 16.1
## Total N 148 31
Write a paragraph describing the people who were on board your ship and what happened to them.
According to our dataset, passengers on the MV Princess Victoria 73.6. percent of the passengers died, and 80 percent of crews died. However, 26.4 percent of passenger survived, and 20 percent of crew members survived. More crew members died than passengers because the crew would assist passengers to put on lifejackets. They also worked to prepare the lifeboats for launching, although the chances of successfully launching lifeboats with the wind gale were very dim. They also had to try to launch them while the vessel was listing badly due to flooding. Considering survival and gender, 70.3 of males died and 100 percent of females died. More males survived than females because more passengers and crew members were males, but also because the captain did not order women and children first on the lifeboats that were able to launch.