Now that we have our data sets set up let’s take a look at them and start to analyze what happened. (And learn some R basics along the way.)
Let’s focus on the Titanic.
Start by loading the dataset that you previously saved.
load(paste("~","data/RMS_Titanic.Rda", sep="/"))
# Do the same thing but for your ship.
The data we downloaded contains 1 row per person on the ship for a given ship, in this case RMS Titanic and your ship.
Find some basic information about the data for your ship using the str() function and the summary() function on the RMS_Titanic data set. That means, put the name of the data set inside the parentheses.
str(RMS_Titanic)
## Classes 'tbl_df', 'tbl' and 'data.frame': 2208 obs. of 20 variables:
## $ Id_8 : num 1 2 3 4 5 6 7 8 9 10 ...
## $ Ship Id : num 8 8 8 8 8 8 8 8 8 8 ...
## $ Year : num 1912 1912 1912 1912 1912 ...
## $ Nationality of the Ship : chr "U.K" "U.K" "U.K" "U.K" ...
## $ Women and children first : num 1 1 1 1 1 1 1 1 1 1 ...
## $ Quick : num 0 0 0 0 0 0 0 0 0 0 ...
## $ Cause : chr "Collision" "Collision" "Collision" "Collision" ...
## $ No. of passengers : num 1317 1317 1317 1317 1317 ...
## $ No. of women passengers : num 463 463 463 463 463 463 463 463 463 463 ...
## $ Women passengers/passengers: num 0.352 0.352 0.352 0.352 0.352 0.352 0.352 0.352 0.352 0.352 ...
## $ Ship size : num 2208 2208 2208 2208 2208 ...
## $ Length of voyage : num 5 5 5 5 5 5 5 5 5 5 ...
## $ Gender : num 0 0 0 0 0 0 0 0 0 0 ...
## $ Age : num 42 16 14 21 30 33 30 26 26 20 ...
## $ Child : logi NA NA NA NA NA NA ...
## $ Crew : num 0 0 0 1 0 1 0 0 1 1 ...
## $ Passenger Class : num 3 3 3 NA 2 NA 3 3 NA NA ...
## $ Nationality of Passenger : logi NA NA NA NA NA NA ...
## $ Companionship : logi NA NA NA NA NA NA ...
## $ Survival : num 0 0 0 0 0 0 0 0 0 0 ...
summary(RMS_Titanic)
## Id_8 Ship Id Year Nationality of the Ship
## Min. : 1.0 Min. :8 Min. :1912 Length:2208
## 1st Qu.: 552.8 1st Qu.:8 1st Qu.:1912 Class :character
## Median :1104.5 Median :8 Median :1912 Mode :character
## Mean :1104.5 Mean :8 Mean :1912
## 3rd Qu.:1656.2 3rd Qu.:8 3rd Qu.:1912
## Max. :2208.0 Max. :8 Max. :1912
##
## Women and children first Quick Cause No. of passengers
## Min. :1 Min. :0 Length:2208 Min. :1317
## 1st Qu.:1 1st Qu.:0 Class :character 1st Qu.:1317
## Median :1 Median :0 Mode :character Median :1317
## Mean :1 Mean :0 Mean :1317
## 3rd Qu.:1 3rd Qu.:0 3rd Qu.:1317
## Max. :1 Max. :0 Max. :1317
##
## No. of women passengers Women passengers/passengers Ship size
## Min. :463 Min. :0.352 Min. :2208
## 1st Qu.:463 1st Qu.:0.352 1st Qu.:2208
## Median :463 Median :0.352 Median :2208
## Mean :463 Mean :0.352 Mean :2208
## 3rd Qu.:463 3rd Qu.:0.352 3rd Qu.:2208
## Max. :463 Max. :0.352 Max. :2208
##
## Length of voyage Gender Age Child
## Min. :5 Min. :0.0000 Min. : 0.00 Mode:logical
## 1st Qu.:5 1st Qu.:0.0000 1st Qu.:22.00 NA's:2208
## Median :5 Median :0.0000 Median :29.00
## Mean :5 Mean :0.2201 Mean :29.91
## 3rd Qu.:5 3rd Qu.:0.0000 3rd Qu.:36.00
## Max. :5 Max. :1.0000 Max. :74.00
## NA's :10
## Crew Passenger Class Nationality of Passenger Companionship
## Min. :0.0000 Min. :1.000 Mode:logical Mode:logical
## 1st Qu.:0.0000 1st Qu.:2.000 NA's:2208 NA's:2208
## Median :0.0000 Median :3.000
## Mean :0.4035 Mean :2.292
## 3rd Qu.:1.0000 3rd Qu.:3.000
## Max. :1.0000 Max. :3.000
## NA's :891
## Survival
## Min. :0.0000
## 1st Qu.:0.0000
## Median :0.0000
## Mean :0.3225
## 3rd Qu.:1.0000
## Max. :1.0000
##
You will probably notice some odd things.
We need to pay attention to the fact that there are some variables not available for all of the ships. When a variable is not available, all values will be missing.
Now let’s get some descrptive information about the people on the ship using the frequency() function. The variables we are interested in in the RMS_Titanic data set are: Gender, Crew, Survival and any other variables that are nominal or ordinal and have data.
Remember the code will look like this:
frequency(RMS_Titanic$Crew)
## Values Freq Percent
## 0 1317 59.6
## 1 891 40.4
## Total 2208 100
Are there any interval variables? If so run a summary() function for them.
summary(RMS_Titanic$'Age')
## Min. 1st Qu. Median Mean 3rd Qu. Max. NA's
## 0.00 22.00 29.00 29.91 36.00 74.00 10
Now run at least three cross tabs. The dependent variable (row.vars) should be Survival for all of them. The two shared variables are Crew and Gender. Pick one other variable describing individual people.
Remember that the code will look like: crosstab(DATASET, row.vars=“”, col.vars=“Survival”,col.vars=“Crew”, title=“Survival by Crew Status”, format=“col_percent”)
crosstab(RMS_Titanic, row.vars="Survival", col.vars="Crew", title="Survival by Crew", format="col_percent")
## Survival by Crew
## 0 1
## 0 62 76.2
## 1 38 23.8
## Total N 1317 891
crosstab(RMS_Titanic, row.vars="Survival", col.vars="Gender", title="Survival by Gender", format="col_percent")
## Survival by Gender
## 0 1
## 0 79.3 26.7
## 1 20.7 73.3
## Total N 1722 486
crosstab(RMS_Titanic, row.vars="Crew", col.vars="Gender", title="Crew by Gender", format="col_percent")
## Crew by Gender
## 0 1
## 0 49.6 95.3
## 1 50.4 4.7
## Total N 1722 486
Write a paragraph describing the people who were on board your ship and what happened to them. According to our dataset, passenger on the Titanic 62 percent of the passengers died and 76.2 percent of crews died. However, 38 percent of passenger survived and 23 percent of crew members survived. More crew members died than passengers because the passengers had priority to survival boats.Considering survival and gender,79.3 of males died and 26.7 percent of femlaes died. More males died than females because more crew members were males. In addition woman and children had priority to escape boats.