Exercise 1
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## 0.000 0.000 0.000 0.342 1.000 1.000
## 1st 2nd 3rd
## 322 280 711
## Min. 1st Qu. Median Mean 3rd Qu. Max. NA's
## 0.1667 21.0000 30.0000 31.1900 41.0000 71.0000 680
## Factor w/ 3 levels "1st","2nd","3rd": 1 1 1 1 1 1 1 1 1 1 ...
Exercise 2 I believe that younger people are less likely to have their age documented because there is a lesser amount of records on them. The older you get the more you interact with state government and they will be more likely to record your age. Probably less likely to have age if in 3rd class as well because they are less likely to have gone to hospital and to doctors because they cannot afford it.
Age missing clearly can be predicted by class. Also if they didn’t have age more likely to not have specifics on origin, dest, or boat.
Exercise 3 As class rises (3rd to 1st) survival rate will increase as well. Survival and sex (female) will be positively correlated. Women from first class will be the ones that most survived, Males will have inverse relationship between survival and age (younger they are without getting too young (around 20s) will have higher non-Survival rates
Exercise 4
Exercise 5
survived predicted by age
fm2 <- glm(survived ~ age, data = titanic_no_NA, family = binomial)
Exercise 6
## [1] 0.6150442
## [1] 0.5283493
## [1] 0.81893
## n pclass age sex survived
## 1 262 NA 31.8944 NA 0.9465649
My fitted model 2 seems to have predicted wrong for mainly women in the first class. There were 137 cases from the 1st class that were predicted wrong and it seemed that most of those people were women. Across all three classes it seems that it was predicted that more women would die than actually did. All of these women also seem to be around 30 to 37 years old. I believe that this was predicted wrong because my fitted model 2 was a prediction of age by survival rate and since most elderly people survived it seems to have weighted too much on the age of these women and predicted they would die because they were younger but not children. Data that would be helpful to make a better prediction here would maybe to add in class to help with the prediction. It would also be helpful to know if they could swim or not, their health status at the time of the event, and maybe how close their room was to a safety boat. All of these could help us predict if someone were to survive better.