For this assignment I chose the titanic dataset, not just because it was used in class, but because it is a great set to use for a bivariate response variable, meaning it only uses 0 or 1.
library(titanic)
library(lmtest)
library(visreg)
The higher the passenger class ( 1st class being the highest), the higher the survival rate. This could be because 1st class passengers have priority boarding the life boat.
Female has higher survival rate than male given the same passenger class ( it means keeping the passenger class fixed at 1, 2, 3…etc). This may be because women would usually board the life boat before the men (this is consistent with the movie, lol)
There is a negative correlation between passenger age and chance of survival. The younger the passengers are, the higher the survival rate is given the same passenger class. It is especially obvious when you compare the survival rate between the age = 28 and age = 50 groups across all passenger classes.
##
## Call:
## glm(formula = Survived ~ Pclass, family = "binomial", data = titanic_train)
##
## Deviance Residuals:
## Min 1Q Median 3Q Max
## -1.4892 -0.7530 -0.7530 0.8948 1.6727
##
## Coefficients:
## Estimate Std. Error z value Pr(>|z|)
## (Intercept) 1.62053 0.22852 7.091 1.33e-12 ***
## Pclass -0.91199 0.09854 -9.255 < 2e-16 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## (Dispersion parameter for binomial family taken to be 1)
##
## Null deviance: 964.52 on 713 degrees of freedom
## Residual deviance: 870.73 on 712 degrees of freedom
## AIC: 874.73
##
## Number of Fisher Scoring iterations: 4
## [1] 874.7301
## [1] 883.8718
##
## Call:
## glm(formula = Survived ~ Pclass + Sex + (Pclass:Sex), family = "binomial",
## data = titanic_train)
##
## Deviance Residuals:
## Min 1Q Median 3Q Max
## -2.8479 -0.7197 -0.5390 0.5109 2.0005
##
## Coefficients:
## Estimate Std. Error z value Pr(>|z|)
## (Intercept) 6.1052 0.8523 7.163 7.87e-13 ***
## Pclass -2.0674 0.3107 -6.655 2.83e-11 ***
## Sexmale -6.0503 0.9079 -6.664 2.66e-11 ***
## Pclass:Sexmale 1.4306 0.3401 4.207 2.59e-05 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## (Dispersion parameter for binomial family taken to be 1)
##
## Null deviance: 964.52 on 713 degrees of freedom
## Residual deviance: 649.63 on 710 degrees of freedom
## AIC: 657.63
##
## Number of Fisher Scoring iterations: 6
## [1] 657.6263
## [1] 675.9098
##
## Call:
## glm(formula = Survived ~ Pclass + Age + (Pclass:Age), family = "binomial",
## data = titanic_train)
##
## Deviance Residuals:
## Min 1Q Median 3Q Max
## -2.1813 -0.8529 -0.6155 1.0119 2.3777
##
## Coefficients:
## Estimate Std. Error z value Pr(>|z|)
## (Intercept) 3.6323734 0.6520791 5.570 2.54e-08 ***
## Pclass -1.2650659 0.2591713 -4.881 1.05e-06 ***
## Age -0.0433827 0.0163847 -2.648 0.0081 **
## Pclass:Age 0.0006892 0.0074719 0.092 0.9265
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## (Dispersion parameter for binomial family taken to be 1)
##
## Null deviance: 964.52 on 713 degrees of freedom
## Residual deviance: 827.42 on 710 degrees of freedom
## AIC: 835.42
##
## Number of Fisher Scoring iterations: 4
## [1] 835.421
## [1] 853.7045
## Likelihood ratio test
##
## Model 1: Survived ~ Pclass
## Model 2: Survived ~ Pclass + Sex + (Pclass:Sex)
## Model 3: Survived ~ Pclass + Age + (Pclass:Age)
## #Df LogLik Df Chisq Pr(>Chisq)
## 1 2 -435.37
## 2 4 -324.81 2 221.10 < 2.2e-16 ***
## 3 4 -413.71 0 177.79 < 2.2e-16 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1