One of the issues we have is that we want to include passenger class in our analysis but we don’t want to lose all the data on the cres. Let’s look at the relationship between Passenger Class and Crew as variables using crosstab().

crosstab(RMS_Titanic, col.vars = "Passenger_Class", format = "frequency")
## NULL
##   1   2   3  
## N 324 285 708
crosstab(RMS_Titanic, col.vars = "Crew", format = "frequency")
## NULL
##   0    1  
## N 1317 891
# Add a single variable crosstab for "Passenger Status"


# Add a crosstab of "Crew" and "Passenger Class" with "Crew" as the row variable.
crosstab(RMS_Titanic, row.vars = "Passenger_Class", col.vars = "Crew", title = "Passenger Class vs. Crew" , format = "frequency" )
## Passenger Class vs. Crew
##         0   
## 1       324 
## 2       285 
## 3       708 
## Total N 1317
table(RMS_Titanic$Crew, RMS_Titanic$`Passenger_Class`, useNA = "ifany")
##    
##       1   2   3 <NA>
##   0 324 285 708    0
##   1   0   0   0  891

Explain how the current variable “Passsenger Class” limits our ability to analyze the data.

Due to missing data, we cannot run a crosstab of crew vs. passenger class

Data Wrangling

Now we want to create a new variable called “Class.Combined” that combines the two variables by adding “Crew” as a status.

RMS_Titanic$Class.Combined<- ifelse(RMS_Titanic$Crew == 1, "Crew", RMS_Titanic$`Passenger_Class`)

Explain in words what the above code does.

The code above is telling R to combine Passenger Class and crew using the “ifelse” function. Then R looks at each person in the dataset, it asks if that person is crew member. If yes, then they become “crew” in our new variable, “class.combined”. If no, then the person is a passenger and R uses the “Passenger Class” variable in our new variable “class.combined”.

Add a crosstab of “Crew” and “Class.Combined” with “Crew” as the row variable.

crosstab(RMS_Titanic, row.vars ="Crew", col.vars = "Class.Combined", title="Everyone on board vs. Crew Status", format="frequency")
## Everyone on board vs. Crew Status
##         1   2   3   Crew
## 0       324 285 708 0   
## 1       0   0   0   891 
## Total N 324 285 708 891

Add a crosstab of “Class.Combined” with “Survival” as the row variable and type=“c” to get column percents.

crosstab(RMS_Titanic, row.vars ="Survival", col.vars = "Class.Combined", title="Survival by Class", format="col_frequency")
## Survival by Class
##         1   2   3   Crew
## 0       123 166 528 679 
## 1       201 119 180 212 
## Total N 324 285 708 891
crosstab(RMS_Titanic, row.vars ="Survival", col.vars = "Class.Combined", title="Survival by Class", format="col_percent")
## Survival by Class
##         1   2    3    Crew
## 0       38  58.2 74.6 76.2
## 1       62  41.8 25.4 23.8
## Total N 324 285  708  891

Write a paragraph summarizing the relationship between Class.Combined and Survival.

First class passengers had a greater precentage (62 percent) of survival than 2nd class (41.8 percent), 3rd class (25.4 percent), and crew members (23.8 percent). Crew members had the highest percentage (76.2 percent) of death than all passenger classes, the passenger class with the highest percentage of death was third class(74.6 percent).Survival percentages decrease with class status, meaning that survival percentages are ordered by class, so class matters for survival on the titanic.

We have previously looked at survival by gender. This time let’s look at the combination of Class.Combined and Gender.

crosstab(RMS_Titanic, row.vars = "Survival", col.vars = c("Class.Combined", "Gender"), title = "Survival by Class and Gender", format = "col_percent")
## Survival by Class and Gender
##         1 0  2 0  3 0  Crew 0 1 1  2 1  3 1  Crew 1
## 0       65.2 85.6 84.8 77.9   3.5  11.4 51.2 13    
## 1       34.8 14.4 15.2 22.1   96.5 88.6 48.8 87    
## Total N 181  180  493  868    143  105  215  23
crosstab(RMS_Titanic, row.vars = "Survival", col.vars = c("Gender", "Class.Combined"), title = "Survival by Class and Gender", format = "col_percent")
## Survival by Class and Gender
##         0 1  1 1  0 2  1 2  0 3  1 3  0 Crew 1 Crew
## 0       65.2 3.5  85.6 11.4 84.8 51.2 77.9   13    
## 1       34.8 96.5 14.4 88.6 15.2 48.8 22.1   87    
## Total N 181  143  180  105  493  215  868    23
crosstab(RMS_Titanic, row.vars = "Survival", col.vars = c("Class.Combined", "Gender"), title = "Survival by Class and Gender", format = "col_frequency")
## Survival by Class and Gender
##         1 0 2 0 3 0 Crew 0 1 1 2 1 3 1 Crew 1
## 0       118 154 418 676    5   12  110 3     
## 1       63  26  75  192    138 93  105 20    
## Total N 181 180 493 868    143 105 215 23
# Do the same table but change the order of the column variables.
crosstab(RMS_Titanic, row.vars = "Survival", col.vars = c("Gender", "Class.Combined"), title = "Survival by Class and Gender", format = "col_percent")
## Survival by Class and Gender
##         0 1  1 1  0 2  1 2  0 3  1 3  0 Crew 1 Crew
## 0       65.2 3.5  85.6 11.4 84.8 51.2 77.9   13    
## 1       34.8 96.5 14.4 88.6 15.2 48.8 22.1   87    
## Total N 181  143  180  105  493  215  868    23
crosstab(RMS_Titanic, row.vars = "Survival", col.vars = c("Class.Combined", "Gender"), title = "Survival by Class and Gender", format = "col_percent")
## Survival by Class and Gender
##         1 0  2 0  3 0  Crew 0 1 1  2 1  3 1  Crew 1
## 0       65.2 85.6 84.8 77.9   3.5  11.4 51.2 13    
## 1       34.8 14.4 15.2 22.1   96.5 88.6 48.8 87    
## Total N 181  180  493  868    143  105  215  23
crosstab(RMS_Titanic, row.vars = "Survival", col.vars = c("Class.Combined", "Gender"), title = "Survival by Class and Gender", format = "col_frequency")
## Survival by Class and Gender
##         1 0 2 0 3 0 Crew 0 1 1 2 1 3 1 Crew 1
## 0       118 154 418 676    5   12  110 3     
## 1       63  26  75  192    138 93  105 20    
## Total N 181 180 493 868    143 105 215 23
#Do the same table but create a frequency table.
crosstab(RMS_Titanic, row.vars = "Survival", col.vars = c("Class.Combined", "Gender"), title = "Survival by Class and Gender", format = "col_frequency")
## Survival by Class and Gender
##         1 0 2 0 3 0 Crew 0 1 1 2 1 3 1 Crew 1
## 0       118 154 418 676    5   12  110 3     
## 1       63  26  75  192    138 93  105 20    
## Total N 181 180 493 868    143 105 215 23

Explain how the three tables relate to each other.

  1. separates tables for men and women
  2. separates class and gender within class
  3. same as first table but uses frequencies

Describe the relationship between Gender, Combined.Class and Survival. Remember that survival is the dependent variable.

In first class, men were more likely to die than women because 65.2 percent of men died and 3.5 percent of women died in first class.

Now let’s add a third variable, Child. Rememember that for RMS_Titanic we have to create the Child variable.

RMS_Titanic$Child <- as.numeric(RMS_Titanic$Age) <= 15

# Run a cross tab with Class.Combined, Gender and Child as column variables and survival as the row variable. 
# You can run several variations if you want.
crosstab(RMS_Titanic, row.vars = "Survival", col.vars =c("Child","Class.Combined","Gender") , title= "Survival by Class, Gender and Child", format= "col_percent")
## Survival by Class, Gender and Child
##         FALSE 0 TRUE 0 FALSE 1 TRUE 1
## 0       80.2    56.8   24.7    41    
## 1       19.8    43.2   75.3    59    
## Total N 1638    74     425     61

Describe the combined relationships between the variables.

The combined relationship between the variables includes gender, 0 for male and 1 for female. False meaning an adult and True meaning a child.

Overall (using all of the tables above as needed): #### Does being a child increase or decrease someone’s chances of survival compared to not being a child? In the marine disaster Titanic being a child or having a child does increases your chances of survival. More children and women survived than male adults in the ship. #### Does being female (gender = 1 ) increase or decrease someone’s chances of survival compared to being a male? Being a female does increase the chances of survival compared to being a male because more women survived the disaster based on the data.

Does being a crew member increase or decrease someone’s chances of survival compared to being in first class?

compared to 1st class being a crew member does not increased your chances of survival.

Does being in third class increase or decrease someone’s chances of survival compared to being in first class?

Compared to 1st class, 3rd class passengers decreases your chances of survival. More 3rd class passenger died in the disaster.

Does being a crew member increase or decrease someone’s survival compared to being in third class?

compared to third class being a crew member increased your chances of survival in the titanic.

Does this seem to be getting a bit complex? That is one reason why sociologists will often switch to a linear model. These may seem complex at first but compared to reading a complex crosstab they may be easier.

Here’s how to run a linear model called a logistic regression. Just run it.

results<-glm(Survival~Class.Combined + Gender + Child, data=RMS_Titanic, family = binomial(link = logit))
coef(results)
##        (Intercept)    Class.Combined2    Class.Combined3 
##         -0.3807736         -1.0247356         -1.8974894 
## Class.CombinedCrew             Gender          ChildTRUE 
##         -0.8610766          2.5212848          0.8769063

We are going to discuss this together, but for right now:

Is the sign of ChildTRUE positive or negative?

the sign of childtrue is positive.

Is the sign of Gender (being female in this case) positive or negative?

The sign of gender beign female is positive in the Titanic disaster.

Is the sign of Class.CombinedCrew (being a crew member) positive or negative?

The sign of beign a crew member is negative because they had a less chance of surviving 1st class passengers.

Is the sign of Class.Combined3 (being in 3rd Class) positive or negative?

The sign of 3rd class passenger is negative they had a less chance of surviving than the other classes.

Is the size (absolute value) of Class.CombinedCrew bigger or smaller than Class.Combined3?

The absolute value of Class.Comibined crew has a bigger number than 3rd passanger class. #### How do these answers compare with the answers about the cross tabulation?