In this lab you will be doing some basic regression models for your ship.
Load the data set for your ship
load("~/data/RMS_Titanic.Rda")
Let’s make nicer labels. We are making a new variable for Survival here because we need it to be a number (not a factor) later.
RMS_Titanic$Survival_nominal<- factor(RMS_Titanic$Survival, levels = c("0","1"), labels = c("Died", "Survived"))
RMS_Titanic$Gender<- factor(RMS_Titanic$Gender, levels = c("0","1"), labels = c("Male", "Female"))
RMS_Titanic$Crew<- factor(RMS_Titanic$Crew, levels = c("0","1"), labels = c("Passenger", "Crew"))
For comparison purposes let’s run a frequency of Survival_nominal and crosstabs of Survival_nominal with Gender and Crew. For the third crosstab make a copy and switch the order of your column variables.
frequency(RMS_Titanic$Survival_nominal)
## Values Freq Percent
## Died 1496 67.8
## Survived 712 32.2
## Total 2208 100
pretty_tab(crosstab( RMS_Titanic, row.vars = "Survival_nominal", col.vars = "Gender",
title ="Surival by Gender", format="col_percent"))
| Male | Female | |
|---|---|---|
| Died | 79.3 | 26.7 |
| Survived | 20.7 | 73.3 |
| Total N | 1722 | 486 |
pretty_tab(crosstab( RMS_Titanic, row.vars = "Survival_nominal", col.vars = "Crew",
title ="Surival of Crew and Passengers",
format="col_percent"))
| Passenger | Crew | |
|---|---|---|
| Died | 62 | 76.2 |
| Survived | 38 | 23.8 |
| Total N | 1317 | 891 |
# This cross tab will control for two variables.
pretty_tab(crosstab( RMS_Titanic, row.vars = "Survival_nominal", col.vars = c("Crew", "Gender"), format="col_percent",
title ="Surival by Gender and Group"))
| Crew Female | Crew Male | Passenger Female | Passenger Male | |
|---|---|---|---|---|
| Died | 13 | 77.9 | 27.4 | 80.8 |
| Survived | 87 | 22.1 | 72.6 | 19.2 |
| Total N | 23 | 868 | 463 | 854 |
pretty_tab(crosstab( RMS_Titanic, row.vars = "Survival_nominal", col.vars = c("Gender", "Crew"),
title ="Surival by Gender and Group", format="col_percent"))
| Female Crew | Female Passenger | Male Crew | Male Passenger | |
|---|---|---|---|---|
| Died | 13 | 27.4 | 77.9 | 80.8 |
| Survived | 87 | 72.6 | 22.1 | 19.2 |
| Total N | 23 | 463 | 868 | 854 |
Now we will run 3 regressions. First one without any independent variables, then one with Gender, then one with Gender and Crew. We use the lm() function which means linear model.
lm(Survival ~ 1, RMS_Titanic)
##
## Call:
## lm(formula = Survival ~ 1, data = RMS_Titanic)
##
## Coefficients:
## (Intercept)
## 0.3225
lm(Survival ~ Gender, RMS_Titanic)
##
## Call:
## lm(formula = Survival ~ Gender, data = RMS_Titanic)
##
## Coefficients:
## (Intercept) GenderFemale
## 0.2067 0.5258
lm(Survival ~ Gender + Crew, RMS_Titanic)
##
## Call:
## lm(formula = Survival ~ Gender + Crew, data = RMS_Titanic)
##
## Coefficients:
## (Intercept) GenderFemale CrewCrew
## 0.18924 0.54163 0.03472
Write out the three equations below:
No independent variable:
Gender only:
Gender and Crew:
Calculate the predicted values for each the equation with no indepdent variable:
How does the equation relate to the frequency distribution results?
Calculate the predicted values for males and females for the Gender only model.
Did being female help or hurt or make no difference?
How do these results compare to the cross tab for gender?
For the equation with Gender and Crew as independent variables, calculate the predicted values for:
Female, Crew Female, Passenger Male, Crew Male, Passenger
Which group has the best predicted probability of surviving?
Did Gender have an effect that was positive, negative or close to 0?
Did it stay the same as the equation for Gender alone? If not, how did it change?
Did Crew have an effect that was positive, negative or close to 0?
How do the predicted values from the equation with two variables compare to the crosstabs?