One of the issues we have is that we want to include passenger class in our analysis but we don’t want to lose all the data on the cres. Let’s look at the relationship between Passenger Class and Crew as variables using crosstab().
crosstab(RMS_Titanic, col.vars = "Passenger_Class", format = "frequency")
## NULL
## 1 2 3
## N 324 285 708
crosstab(RMS_Titanic, col.vars = "Crew", format = "frequency")
## NULL
## 0 1
## N 1317 891
# Add a single variable crosstab for "Passenger Status"
# Add a crosstab of "Crew" and "Passenger Class" with "Crew" as the row variable.
crosstab(RMS_Titanic, row.vars = "Passenger_Class", col.vars = "Crew", title = "Passenger Class vs. Crew" , format = "frequency" )
## Passenger Class vs. Crew
## 0
## 1 324
## 2 285
## 3 708
## Total N 1317
table(RMS_Titanic$Crew, RMS_Titanic$`Passenger_Class`, useNA = "ifany")
##
## 1 2 3 <NA>
## 0 324 285 708 0
## 1 0 0 0 891
Due to missing data, we cannot run a crosstab of crew vs. passenger class
Now we want to create a new variable called “Class.Combined” that combines the two variables by adding “Crew” as a status.
RMS_Titanic$Class.Combined<- ifelse(RMS_Titanic$Crew == 1, "Crew", RMS_Titanic$`Passenger_Class`)
The code above is telling R to combine Passenger Class and crew using the “ifelse” function. Then R looks at each person in the dataset, it asks if that person is crew member. If yes, then they become “crew” in our new variable, “class.combined”. If no, then the person is a passenger and R uses the “Passenger Class” variable in our new variable “class.combined”.
crosstab(RMS_Titanic, row.vars ="Crew", col.vars = "Class.Combined", title="Everyone on board vs. Crew Status", format="frequency")
## Everyone on board vs. Crew Status
## 1 2 3 Crew
## 0 324 285 708 0
## 1 0 0 0 891
## Total N 324 285 708 891
Add a crosstab of “Class.Combined” with “Survival” as the row variable and type=“c” to get column percents.
crosstab(RMS_Titanic, row.vars ="Survival", col.vars = "Class.Combined", title="Survival by Class", format="col_frequency")
## Survival by Class
## 1 2 3 Crew
## 0 123 166 528 679
## 1 201 119 180 212
## Total N 324 285 708 891
crosstab(RMS_Titanic, row.vars ="Survival", col.vars = "Class.Combined", title="Survival by Class", format="col_percent")
## Survival by Class
## 1 2 3 Crew
## 0 38 58.2 74.6 76.2
## 1 62 41.8 25.4 23.8
## Total N 324 285 708 891
First class passengers had a greater precentage (62 percent) of survival than 2nd class (41.8 percent), 3rd class (25.4 percent), and crew members (23.8 percent). Crew members had the highest percentage (76.2 percent) of death than all passenger classes, the passenger class with the highest percentage of death was third class(74.6 percent).Survival percentages decrease with class status, meaning that survival percentages are ordered by class, so class matters for survival on the titanic.
We have previously looked at survival by gender. This time let’s look at the combination of Class.Combined and Gender.
crosstab(RMS_Titanic, row.vars = "Survival", col.vars = c("Class.Combined", "Gender"), title = "Survival by Class and Gender", format = "col_percent")
## Survival by Class and Gender
## 1 0 2 0 3 0 Crew 0 1 1 2 1 3 1 Crew 1
## 0 65.2 85.6 84.8 77.9 3.5 11.4 51.2 13
## 1 34.8 14.4 15.2 22.1 96.5 88.6 48.8 87
## Total N 181 180 493 868 143 105 215 23
crosstab(RMS_Titanic, row.vars = "Survival", col.vars = c("Gender", "Class.Combined"), title = "Survival by Class and Gender", format = "col_percent")
## Survival by Class and Gender
## 0 1 1 1 0 2 1 2 0 3 1 3 0 Crew 1 Crew
## 0 65.2 3.5 85.6 11.4 84.8 51.2 77.9 13
## 1 34.8 96.5 14.4 88.6 15.2 48.8 22.1 87
## Total N 181 143 180 105 493 215 868 23
crosstab(RMS_Titanic, row.vars = "Survival", col.vars = c("Class.Combined", "Gender"), title = "Survival by Class and Gender", format = "col_frequency")
## Survival by Class and Gender
## 1 0 2 0 3 0 Crew 0 1 1 2 1 3 1 Crew 1
## 0 118 154 418 676 5 12 110 3
## 1 63 26 75 192 138 93 105 20
## Total N 181 180 493 868 143 105 215 23
# Do the same table but change the order of the column variables.
crosstab(RMS_Titanic, row.vars = "Survival", col.vars = c("Gender", "Class.Combined"), title = "Survival by Class and Gender", format = "col_percent")
## Survival by Class and Gender
## 0 1 1 1 0 2 1 2 0 3 1 3 0 Crew 1 Crew
## 0 65.2 3.5 85.6 11.4 84.8 51.2 77.9 13
## 1 34.8 96.5 14.4 88.6 15.2 48.8 22.1 87
## Total N 181 143 180 105 493 215 868 23
crosstab(RMS_Titanic, row.vars = "Survival", col.vars = c("Class.Combined", "Gender"), title = "Survival by Class and Gender", format = "col_percent")
## Survival by Class and Gender
## 1 0 2 0 3 0 Crew 0 1 1 2 1 3 1 Crew 1
## 0 65.2 85.6 84.8 77.9 3.5 11.4 51.2 13
## 1 34.8 14.4 15.2 22.1 96.5 88.6 48.8 87
## Total N 181 180 493 868 143 105 215 23
crosstab(RMS_Titanic, row.vars = "Survival", col.vars = c("Class.Combined", "Gender"), title = "Survival by Class and Gender", format = "col_frequency")
## Survival by Class and Gender
## 1 0 2 0 3 0 Crew 0 1 1 2 1 3 1 Crew 1
## 0 118 154 418 676 5 12 110 3
## 1 63 26 75 192 138 93 105 20
## Total N 181 180 493 868 143 105 215 23
#Do the same table but create a frequency table.
crosstab(RMS_Titanic, row.vars = "Survival", col.vars = c("Class.Combined", "Gender"), title = "Survival by Class and Gender", format = "col_frequency")
## Survival by Class and Gender
## 1 0 2 0 3 0 Crew 0 1 1 2 1 3 1 Crew 1
## 0 118 154 418 676 5 12 110 3
## 1 63 26 75 192 138 93 105 20
## Total N 181 180 493 868 143 105 215 23
In first class, men were more likely to die than women because 65.2 percent of men died and 3.5 percent of women died in first class.
Now let’s add a third variable, Child. Rememember that for RMS_Titanic we have to create the Child variable.
RMS_Titanic$Child <- as.numeric(RMS_Titanic$Age) <= 15
# Run a cross tab with Class.Combined, Gender and Child as column variables and survival as the row variable.
# You can run several variations if you want.
crosstab(RMS_Titanic, row.vars = "Survival", col.vars =c("Child","Class.Combined","Gender") , title= "Survival by Class, Gender and Child", format= "col_percent")
## Survival by Class, Gender and Child
## FALSE 0 TRUE 0 FALSE 1 TRUE 1
## 0 80.2 56.8 24.7 41
## 1 19.8 43.2 75.3 59
## Total N 1638 74 425 61
The combined relationship between the variables includes gender, 0 for male and 1 for female. False meaning an adult and True meaning a child.
Overall (using all of the tables above as needed): #### Does being a child increase or decrease someone’s chances of survival compared to not being a child? In the marine disaster Titanic being a child or having a child does increases your chances of survival. More children and women survived than male adults in the ship. #### Does being female (gender = 1 ) increase or decrease someone’s chances of survival compared to being a male? Being a female does increase the chances of survival compared to being a male because more women survived the disaster based on the data.
compared to 1st class being a crew member does not increased your chances of survival.
Compared to 1st class, 3rd class passengers decreases your chances of survival. More 3rd class passenger died in the disaster.
compared to third class being a crew member increased your chances of survival in the titanic.
Does this seem to be getting a bit complex? That is one reason why sociologists will often switch to a linear model. These may seem complex at first but compared to reading a complex crosstab they may be easier.
Here’s how to run a linear model called a logistic regression. Just run it.
results<-glm(Survival~Class.Combined + Gender + Child, data=RMS_Titanic, family = binomial(link = logit))
coef(results)
## (Intercept) Class.Combined2 Class.Combined3
## -0.3807736 -1.0247356 -1.8974894
## Class.CombinedCrew Gender ChildTRUE
## -0.8610766 2.5212848 0.8769063
We are going to discuss this together, but for right now:
the sign of childtrue is positive.
The sign of gender beign female is positive in the Titanic disaster.
The sign of beign a crew member is negative because they had a less chance of surviving 1st class passengers.
The sign of 3rd class passenger is negative they had a less chance of surviving than the other classes.
The absolute value of Class.Comibined crew has a bigger number than 3rd passanger class. #### How do these answers compare with the answers about the cross tabulation?