The dataset that was used this week is a Game of Thrones character prediction that was obtained through kaggle.com. Using this dataset, I intend to measure the popularity of gender and age that has been collected. The data is generally compsed of 33 variables and 1900 observations. The binary variable being used is isAlive which indicates whether or not a character from the novels has been killed of. The variable has been coded using 0 for dead and 1 for living. Other variables being used are popularity, isNoble, isMarried, and age.
## # A tibble: 6 x 23
## S.no plod name title male culture dateOfBirth DateoFdeath NobleHouse
## <dbl> <dbl> <chr> <chr> <dbl> <chr> <dbl> <dbl> <chr>
## 1 1922 0.885 Barra None 1 Unkown 298 0 Baratheon
## 2 1195 0.563 Edri~ None 1 Unkown 287 0 Baratheon
## 3 1785 0.55 Renl~ Lord~ 1 Storml~ 277 0 Baratheon
## 4 1812 0.934 Stef~ Stor~ 1 Unkown 246 0 Baratheon
## 5 1682 0.426 Dona~ None 1 Unkown 244 0 Baratheon
## 6 1683 0.026 Dome~ None 1 Northm~ 279 0 Bolton
## # ... with 14 more variables: spouse <chr>, book1 <dbl>, book2 <dbl>,
## # book3 <dbl>, book4 <dbl>, book5 <dbl>, isMarried <dbl>, isNoble <dbl>,
## # age <dbl>, numDeadRelations <dbl>, boolDeadRelations <dbl>,
## # isPopular <dbl>, popularity <dbl>, isAlive <dbl>
model_1 <- glm(isAlive ~ popularity, family = 'binomial', data = GoT.Character.Prediction)
summary(model_1)
##
## Call:
## glm(formula = isAlive ~ popularity, family = "binomial", data = GoT.Character.Prediction)
##
## Deviance Residuals:
## Min 1Q Median 3Q Max
## -1.3773 -1.3679 0.9921 0.9950 1.0263
##
## Coefficients:
## Estimate Std. Error z value Pr(>|z|)
## (Intercept) 0.45843 0.18526 2.475 0.0133 *
## popularity -0.09194 0.50347 -0.183 0.8551
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## (Dispersion parameter for binomial family taken to be 1)
##
## Null deviance: 293.43 on 218 degrees of freedom
## Residual deviance: 293.40 on 217 degrees of freedom
## AIC: 297.4
##
## Number of Fisher Scoring iterations: 4
From model 1, the coefficient is negative. From this, it can be interpretted as character popularity decreases when character status is living. Interpreting the y-intercept, it can be stated that the odds of a character still being alive throughout the Game of Thrones story is .61292. The value of the slope can be interpreted as for every one-unit of popularity, the log odds of a character still being alive decreases by -.023. Moreover, from the z value, is less than one standard deviation away from zero indicating that the results are not statistically significant.
model_2 <- glm(isAlive ~ popularity + isNoble, family = 'binomial', data = GoT.Character.Prediction)
summary(model_2)
##
## Call:
## glm(formula = isAlive ~ popularity + isNoble, family = "binomial",
## data = GoT.Character.Prediction)
##
## Deviance Residuals:
## Min 1Q Median 3Q Max
## -1.9983 -0.9041 0.6911 0.7441 1.6249
##
## Coefficients:
## Estimate Std. Error z value Pr(>|z|)
## (Intercept) 1.0496 0.2324 4.517 6.27e-06 ***
## popularity 1.8626 0.6332 2.942 0.00326 **
## isNoble -2.1521 0.3603 -5.973 2.33e-09 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## (Dispersion parameter for binomial family taken to be 1)
##
## Null deviance: 293.43 on 218 degrees of freedom
## Residual deviance: 252.83 on 216 degrees of freedom
## AIC: 258.83
##
## Number of Fisher Scoring iterations: 4
For model 2, the variable of isNoble has be added. Based on this addition, the odds of a character being alive is significantly reduced by -2.15 and the log odds of survival based on character popularity has increased to 1.86.
model_3 <- glm(isAlive ~ popularity + isNoble + isMarried, family = 'binomial', data = GoT.Character.Prediction)
summary(model_3)
##
## Call:
## glm(formula = isAlive ~ popularity + isNoble + isMarried, family = "binomial",
## data = GoT.Character.Prediction)
##
## Deviance Residuals:
## Min 1Q Median 3Q Max
## -1.9705 -0.8856 0.6665 0.7343 1.7102
##
## Coefficients:
## Estimate Std. Error z value Pr(>|z|)
## (Intercept) 1.1282 0.2381 4.737 2.17e-06 ***
## popularity 1.8814 0.6376 2.951 0.00317 **
## isNoble -1.8480 0.3938 -4.692 2.70e-06 ***
## isMarried -0.6484 0.3601 -1.801 0.07175 .
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## (Dispersion parameter for binomial family taken to be 1)
##
## Null deviance: 293.43 on 218 degrees of freedom
## Residual deviance: 249.62 on 215 degrees of freedom
## AIC: 257.62
##
## Number of Fisher Scoring iterations: 4
Model 3 introduces the vairable of isMarried. The addition of this variable finds that the log odds of character survival for characyers that are married is significantly reduced by -.65. However, when compared to isNoble, the log odds of survival are higher than log odds of survival for nobles, even with a significantly increase from -2.15 to -1.84.
model_4 <- glm(isAlive ~ popularity + isNoble + isMarried + age, family = 'binomial', data = GoT.Character.Prediction)
summary(model_4)
##
## Call:
## glm(formula = isAlive ~ popularity + isNoble + isMarried + age,
## family = "binomial", data = GoT.Character.Prediction)
##
## Deviance Residuals:
## Min 1Q Median 3Q Max
## -2.0148 -0.8982 0.6227 0.7398 1.8745
##
## Coefficients:
## Estimate Std. Error z value Pr(>|z|)
## (Intercept) 1.609293 0.321561 5.005 5.6e-07 ***
## popularity 1.643093 0.652345 2.519 0.011777 *
## isNoble -1.584641 0.411720 -3.849 0.000119 ***
## isMarried -0.500931 0.369652 -1.355 0.175372
## age -0.017385 0.007463 -2.330 0.019830 *
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## (Dispersion parameter for binomial family taken to be 1)
##
## Null deviance: 293.43 on 218 degrees of freedom
## Residual deviance: 243.93 on 214 degrees of freedom
## AIC: 253.93
##
## Number of Fisher Scoring iterations: 4
With model 4, we introduce the variable of age. This model indicates that the each year a character ages, the log odds of survival decreases -.017.
model_5 <- glm(isAlive ~ (popularity * male) + isNoble + isMarried + (age * male), family = 'binomial' , data = GoT.Character.Prediction)
summary(model_5)
##
## Call:
## glm(formula = isAlive ~ (popularity * male) + isNoble + isMarried +
## (age * male), family = "binomial", data = GoT.Character.Prediction)
##
## Deviance Residuals:
## Min 1Q Median 3Q Max
## -2.2238 -0.8871 0.3369 0.8823 1.9411
##
## Coefficients:
## Estimate Std. Error z value Pr(>|z|)
## (Intercept) 3.25589 0.72252 4.506 6.6e-06 ***
## popularity 2.51583 1.83451 1.371 0.17025
## male -2.64899 0.81923 -3.234 0.00122 **
## isNoble -1.34606 0.42100 -3.197 0.00139 **
## isMarried -0.70989 0.38055 -1.865 0.06212 .
## age -0.04304 0.01394 -3.088 0.00202 **
## popularity:male -0.98097 1.89469 -0.518 0.60463
## male:age 0.04232 0.01634 2.590 0.00959 **
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## (Dispersion parameter for binomial family taken to be 1)
##
## Null deviance: 293.43 on 218 degrees of freedom
## Residual deviance: 222.62 on 211 degrees of freedom
## AIC: 238.62
##
## Number of Fisher Scoring iterations: 5
Adding the vairable of male to interact with popularity, the data indicates that based on popularity the log odds of survival among males is -.98 and the odds of survivial for males based on age is .042. We can therefore indicate that gender does influence does influence the log odds of survival.
| Model 1 | Model 2 | Model 3 | Model 4 | ||
|---|---|---|---|---|---|
| (Intercept) | 0.460* | 0.730*** | 0.745*** | 0.838*** | |
| (0.185) | (0.044) | (0.044) | (0.058) | ||
| popularity | -0.096 | 0.417** | 0.414** | 0.356** | |
| (0.504) | (0.127) | (0.126) | (0.127) | ||
| isNoble | -0.481*** | -0.409*** | -0.344*** | ||
| (0.070) | (0.079) | (0.082) | |||
| isMarried | -0.139 | -0.109 | |||
| (0.074) | (0.074) | ||||
| age | -0.003* | ||||
| (0.001) | |||||
| AIC | 297.396 | 271.969 | 270.416 | 266.300 | |
| BIC | 304.175 | 285.525 | 287.362 | 286.634 | |
| Log Likelihood | -146.698 | -131.984 | -130.208 | -127.150 | |
| Deviance | 293.396 | 42.800 | 42.111 | 40.951 | |
| Num. obs. | 219 | 219 | 219 | 219 | |
| p < 0.001, p < 0.01, p < 0.05 | |||||
Based on both the AIC and BIC, it can be concluded not only that the best model to use is Model 4, but also that the models get show improvement as variables are included.
library(visreg)
visreg(model_4, "popularity", by = "age", scale = "response")
As we can see, based on both popularity and age, younger Game of Throne characters are more likely to be alive when compared to the older ages.
library(visreg)
visreg(model_5, "popularity", by = "male", scale = "response")
The first graph indicates that females characters not only have higer popularity, by they they also have higher odds of surviving the Game of Thrones
library(interactions)
## Warning: package 'interactions' was built under R version 3.5.3
interact_plot(model_5, pred = popularity , modx = male)
The interaction graph further reinforces the notion that female characters are not only more popular, but that they also have higher odds of survival.