Exercise 2. Contingency analysis for the ‘Titanic’ data.

Let’s consider the titanic dataset which contains a complete list of passengers and crew members on the RMS Titanic. It includes a variable indicating whether a person did survive the sinking of the RMS Titanic on April 15, 1912. A data frame contains 2456 observations on 14 variables.

The website http://www.encyclopedia-titanica.org/ offers detailed information about passengers and crew members on the RMS Titanic. According to the website 1317 passengers and 890 crew member were aboard.

8 musicians and 9 employees of the shipyard company are listed as passengers, but travelled with a free ticket, which is why they have NA values in fare. In addition to that, fare is truely missing for a few regular passengers.

Analysis

First we clean the empty values in Status column then create a geom bar with general data of survivours and victims

dane2<-titanic
cleanStatus <- dane2[!dane2$Status == "", ]

ggplot(cleanStatus, aes(x = Status)) +
  geom_bar() +
  geom_text(stat = "count", aes(label = ..count..), position = position_dodge(width = 0.9), vjust = -0.5) +
  labs(title = "Survivors and Victims",
       x = "Status",
       y = "Count") +
  theme_minimal()
## Warning: The dot-dot notation (`..count..`) was deprecated in ggplot2 3.4.0.
## ℹ Please use `after_stat(count)` instead.
## This warning is displayed once every 8 hours.
## Call `lifecycle::last_lifecycle_warnings()` to see where this warning was
## generated.

Now lets look at gender in context of survival rates

ggplot(cleanStatus, aes(x = Status, fill = Gender)) +
  geom_bar(position = "dodge") +
  geom_text(stat = "count", aes(label = ..count..), position = position_dodge(width = 0.9), vjust = -0.5) +
  labs(title = "Survival Rate by Gender",
       x = "Status",
       y = "Count",
       fill = "Gender") +
  theme_minimal()

corelation estimation

cleanStatus$status_binary <- ifelse(cleanStatus$Status == "Survivor", 1, 0)
cleanStatus$gender_binary <- ifelse(cleanStatus$Gender == "Female", 1, 0)

status_gender <- table(cleanStatus$status_binary, cleanStatus$gender_binary)

phi_coefficient <- cor(cleanStatus$status_binary, cleanStatus$gender_binary)
print(phi_coefficient)
## [1] 0.4703662

0.47 result indicates a noticible relation between gender and survival rate. It is not an extremly strong relation but it’s not insignificant either. This suggests that females had a higher likelihood of surviving compared to males, but other factors also contribute to survival chances.

And now let’s move on to vizualising the survival rates in terms of being a crew member or a passenger

ggplot(cleanStatus, aes(x = Status, fill = Crew.or.Passenger.)) +
  geom_bar(position = "dodge") +
  geom_text(stat = "count", aes(label = ..count..), position = position_dodge(width = 0.9), vjust = -0.5) +
  labs(title = "Survival Rate by the Cause of the Excursion",
       x = "Status",
       y = "Crew Member or Passenger") +
  theme_minimal()

# corelation estimation

cleanStatus$Crew.or.Passenger_binary <- ifelse(cleanStatus$Crew.or.Passenger == "Passenger", 1, 0)

status_Crew.or.Passenger <- table(cleanStatus$status_binary, cleanStatus$gender_binary)

phi_coefficient <- cor(cleanStatus$status_binary, cleanStatus$Crew.or.Passenger_binary)

print(phi_coefficient)
## [1] 0.1496655

A correlation coefficient of 0.15 shows a weak positive relationship, meaning passengers were slightly more likely to survive than crew members on the Titanic.

Survival based on class

ggplot(cleanStatus, aes(x = Class...Department, fill = Status)) +
  geom_bar(position = "dodge") +
  theme(axis.text.x = element_text(angle = 45, hjust = 1)) +
  geom_text(stat = "count", aes(label = ..count..), position = position_dodge(width = 0.9), vjust = -0.5) +
  labs(title = "Survival Rate by class and department of passengers",
       x = "Status",
       y = "Count",
       fill = "Class Department") 

we single out only passengers of ship

onlyPassengers <- cleanStatus[!cleanStatus$Crew.or.Passenger. == "Crew", ]

ggplot(onlyPassengers, aes(x = Class...Department, fill = Status)) +
  geom_bar(position = "dodge") +
  theme(axis.text.x = element_text(angle = 45, hjust = 1)) +
  geom_text(stat = "count", aes(label = ..count..), position = position_dodge(width = 0.9), vjust = -0.5) +
  labs(title = "Survival Rate by class and department of passengers",
       x = "Status",
       y = "Count",
       fill = "Class Department") 

LS0tDQp0aXRsZTogJ0Rlc2NyaXB0aXZlIFN0YXRpc3RpY3MnDQpzdWJ0aXRsZTogJ0JpdmFyaWF0ZSBBbmFseXNpcycNCmRhdGU6ICJgciBTeXMuRGF0ZSgpYCINCmF1dGhvcjogIllvdXIgbmFtZSBoZXJlIg0Kb3V0cHV0Og0KICBodG1sX2RvY3VtZW50OiANCiAgICB0aGVtZTogY2VydWxlYW4NCiAgICBoaWdobGlnaHQ6IHRleHRtYXRlDQogICAgZm9udHNpemU6IDEwcHQNCiAgICB0b2M6IHllcw0KICAgIGNvZGVfZG93bmxvYWQ6IHllcw0KICAgIHRvY19mbG9hdDoNCiAgICAgIGNvbGxhcHNlZDogbm8NCiAgICBkZl9wcmludDogZGVmYXVsdA0KICAgIHRvY19kZXB0aDogNQ0KZWRpdG9yX29wdGlvbnM6IA0KICBtYXJrZG93bjogDQogICAgd3JhcDogNzINCi0tLQ0KDQpgYGB7ciBzZXR1cCwJbWVzc2FnZSA9IEZBTFNFLAl3YXJuaW5nID0gRkFMU0UsCWluY2x1ZGUgPSBGQUxTRX0NCmxpYnJhcnkoZHBseXIpDQpsaWJyYXJ5KHRpZHl2ZXJzZSkNCmxpYnJhcnkoSFNBVVIzKQ0KbGlicmFyeShoYXZlbikNCmxpYnJhcnkoZ2dwbG90MikNCmxpYnJhcnkoZ3JpZEV4dHJhKQ0KbGlicmFyeShwcGNvcikgIyB0aGlzIHBhY2thZ2UgY29tcHV0ZXMgcGFydGlhbCBhbmQgc2VtaXBhcnRpYWwgY29ycmVsYXRpb25zLg0KbGlicmFyeShsdG0pICMgdGhpcyBwYWNrYWdlIGNvbXB1dGVzIHBvaW50LWJpc2VyaWFsIGNvcnJlbGF0aW9ucy4NCmxpYnJhcnkoZGV2dG9vbHMpIA0KaW5zdGFsbF9naXRodWIoIm1hcmtoZWNrbWFubi9yeW91cmVhZHkiKSAjIHBsZWFzZSBpbnN0YWxsIHBhY2thZ2UgInJ5b3VyZWFkeSIgZnJvbSBnaXRodWIhICh0aGVuICMgaXQpDQpsaWJyYXJ5KHJ5b3VyZWFkeSkgIyB0aGlzIHBhY2thZ2UgY29tcHV0ZXMgbm9ubGluZWFyICJldGEiIGNvcnJlbGF0aW9ucy4NCmxpYnJhcnkoR0dhbGx5KSAjIHRoaXMgcGFja2FnZSBjb21wdXRlcyBjb3JyZWxhdGlvbiBtYXRyaXguDQpsaWJyYXJ5KHBzeWNoKSAjIHRoaXMgcGFja2FnZSBjb21wdXRlcyBxdWFsaXRhdGl2ZSBjb3JyZWxhdGlvbnMuDQpsaWJyYXJ5KERlc2NUb29scykgIyB0aGlzIHBhY2thZ2UgY29tcHV0ZXMgcXVhbGl0YXRpdmUgY29ycmVsYXRpb25zLg0KYGBgDQoNCg0KIyMgRXhlcmNpc2UgMi4gQ29udGluZ2VuY3kgYW5hbHlzaXMgZm9yIHRoZSAnVGl0YW5pYycgZGF0YS4NCg0KTGV0J3MgY29uc2lkZXIgdGhlIHRpdGFuaWMgZGF0YXNldCB3aGljaCBjb250YWlucyBhIGNvbXBsZXRlIGxpc3Qgb2YgcGFzc2VuZ2VycyBhbmQgY3JldyBtZW1iZXJzIG9uIHRoZSBSTVMgVGl0YW5pYy4gSXQgaW5jbHVkZXMgYSB2YXJpYWJsZSBpbmRpY2F0aW5nIHdoZXRoZXIgYSBwZXJzb24gZGlkIHN1cnZpdmUgdGhlIHNpbmtpbmcgb2YgdGhlIFJNUyBUaXRhbmljIG9uIEFwcmlsIDE1LCAxOTEyLg0KQSBkYXRhIGZyYW1lIGNvbnRhaW5zIDI0NTYgb2JzZXJ2YXRpb25zIG9uIDE0IHZhcmlhYmxlcy4NCg0KYGBge3IgbG9hZC1kYXRhMiwgd2FybmluZz1UUlVFLCBpbmNsdWRlPUZBTFNFfQ0KZG93bmxvYWQuZmlsZSgiaHR0cHM6Ly9naXRodWIuY29tL2tmbGlzaWtvd3NraS9kcy9ibG9iL21hc3Rlci90aXRhbmljLmNzdj9yYXc9dHJ1ZSIsIGRlc3RmaWxlID0idGl0YW5pYy5jc3YiLG1vZGU9IndiIikNCnRpdGFuaWMgPC0gcmVhZC5jc3YoInRpdGFuaWMuY3N2Iixyb3cubmFtZXM9MSxzZXA9IjsiKQ0KYGBgDQoNClRoZSB3ZWJzaXRlIGh0dHA6Ly93d3cuZW5jeWNsb3BlZGlhLXRpdGFuaWNhLm9yZy8gb2ZmZXJzIGRldGFpbGVkIGluZm9ybWF0aW9uIGFib3V0IHBhc3NlbmdlcnMgYW5kIGNyZXcgbWVtYmVycyBvbiB0aGUgUk1TIFRpdGFuaWMuIEFjY29yZGluZyB0byB0aGUgd2Vic2l0ZSAxMzE3IHBhc3NlbmdlcnMgYW5kIDg5MCBjcmV3IG1lbWJlciB3ZXJlIGFib2FyZC4NCg0KOCBtdXNpY2lhbnMgYW5kIDkgZW1wbG95ZWVzIG9mIHRoZSBzaGlweWFyZCBjb21wYW55IGFyZSBsaXN0ZWQgYXMgcGFzc2VuZ2VycywgYnV0IHRyYXZlbGxlZCB3aXRoIGEgZnJlZSB0aWNrZXQsIHdoaWNoIGlzIHdoeSB0aGV5IGhhdmUgTkEgdmFsdWVzIGluIGZhcmUuIEluIGFkZGl0aW9uIHRvIHRoYXQsIGZhcmUgaXMgdHJ1ZWx5IG1pc3NpbmcgZm9yIGEgZmV3IHJlZ3VsYXIgcGFzc2VuZ2Vycy4gDQoNCiMjIEFuYWx5c2lzDQojIEZpcnN0IHdlIGNsZWFuIHRoZSBlbXB0eSB2YWx1ZXMgaW4gU3RhdHVzIGNvbHVtbiB0aGVuIGNyZWF0ZSBhIGdlb20gYmFyIHdpdGggZ2VuZXJhbCBkYXRhIG9mIHN1cnZpdm91cnMgYW5kIHZpY3RpbXMNCg0KYGBge3J9DQpkYW5lMjwtdGl0YW5pYw0KY2xlYW5TdGF0dXMgPC0gZGFuZTJbIWRhbmUyJFN0YXR1cyA9PSAiIiwgXQ0KDQpnZ3Bsb3QoY2xlYW5TdGF0dXMsIGFlcyh4ID0gU3RhdHVzKSkgKw0KICBnZW9tX2JhcigpICsNCiAgZ2VvbV90ZXh0KHN0YXQgPSAiY291bnQiLCBhZXMobGFiZWwgPSAuLmNvdW50Li4pLCBwb3NpdGlvbiA9IHBvc2l0aW9uX2RvZGdlKHdpZHRoID0gMC45KSwgdmp1c3QgPSAtMC41KSArDQogIGxhYnModGl0bGUgPSAiU3Vydml2b3JzIGFuZCBWaWN0aW1zIiwNCiAgICAgICB4ID0gIlN0YXR1cyIsDQogICAgICAgeSA9ICJDb3VudCIpICsNCiAgdGhlbWVfbWluaW1hbCgpDQoNCg0KYGBgDQoNCiMgTm93IGxldHMgbG9vayBhdCBnZW5kZXIgaW4gY29udGV4dCBvZiBzdXJ2aXZhbCByYXRlcw0KYGBge3J9DQoNCmdncGxvdChjbGVhblN0YXR1cywgYWVzKHggPSBTdGF0dXMsIGZpbGwgPSBHZW5kZXIpKSArDQogIGdlb21fYmFyKHBvc2l0aW9uID0gImRvZGdlIikgKw0KICBnZW9tX3RleHQoc3RhdCA9ICJjb3VudCIsIGFlcyhsYWJlbCA9IC4uY291bnQuLiksIHBvc2l0aW9uID0gcG9zaXRpb25fZG9kZ2Uod2lkdGggPSAwLjkpLCB2anVzdCA9IC0wLjUpICsNCiAgbGFicyh0aXRsZSA9ICJTdXJ2aXZhbCBSYXRlIGJ5IEdlbmRlciIsDQogICAgICAgeCA9ICJTdGF0dXMiLA0KICAgICAgIHkgPSAiQ291bnQiLA0KICAgICAgIGZpbGwgPSAiR2VuZGVyIikgKw0KICB0aGVtZV9taW5pbWFsKCkNCg0KDQpgYGANCg0KIyBjb3JlbGF0aW9uIGVzdGltYXRpb24NCmBgYHtyfQ0KDQpjbGVhblN0YXR1cyRzdGF0dXNfYmluYXJ5IDwtIGlmZWxzZShjbGVhblN0YXR1cyRTdGF0dXMgPT0gIlN1cnZpdm9yIiwgMSwgMCkNCmNsZWFuU3RhdHVzJGdlbmRlcl9iaW5hcnkgPC0gaWZlbHNlKGNsZWFuU3RhdHVzJEdlbmRlciA9PSAiRmVtYWxlIiwgMSwgMCkNCg0Kc3RhdHVzX2dlbmRlciA8LSB0YWJsZShjbGVhblN0YXR1cyRzdGF0dXNfYmluYXJ5LCBjbGVhblN0YXR1cyRnZW5kZXJfYmluYXJ5KQ0KDQpwaGlfY29lZmZpY2llbnQgPC0gY29yKGNsZWFuU3RhdHVzJHN0YXR1c19iaW5hcnksIGNsZWFuU3RhdHVzJGdlbmRlcl9iaW5hcnkpDQpwcmludChwaGlfY29lZmZpY2llbnQpDQoNCg0KYGBgDQogMC40NyByZXN1bHQgaW5kaWNhdGVzIGEgbm90aWNpYmxlIHJlbGF0aW9uIGJldHdlZW4gZ2VuZGVyIGFuZCBzdXJ2aXZhbCByYXRlLiBJdCBpcyBub3QgYW4gZXh0cmVtbHkgc3Ryb25nDQogcmVsYXRpb24gYnV0IGl0J3Mgbm90IGluc2lnbmlmaWNhbnQgZWl0aGVyLiBUaGlzIHN1Z2dlc3RzIHRoYXQgZmVtYWxlcyBoYWQgYSBoaWdoZXIgbGlrZWxpaG9vZCBvZiBzdXJ2aXZpbmcgDQpjb21wYXJlZCB0byBtYWxlcywgYnV0IG90aGVyIGZhY3RvcnMgYWxzbyBjb250cmlidXRlIHRvIHN1cnZpdmFsIGNoYW5jZXMuDQoNCg0KDQojIEFuZCBub3cgbGV0J3MgbW92ZSBvbiB0byB2aXp1YWxpc2luZyB0aGUgc3Vydml2YWwgcmF0ZXMgaW4gdGVybXMgb2YgYmVpbmcgYSBjcmV3IG1lbWJlciBvciBhIHBhc3Nlbmdlcg0KYGBge3J9DQpnZ3Bsb3QoY2xlYW5TdGF0dXMsIGFlcyh4ID0gU3RhdHVzLCBmaWxsID0gQ3Jldy5vci5QYXNzZW5nZXIuKSkgKw0KICBnZW9tX2Jhcihwb3NpdGlvbiA9ICJkb2RnZSIpICsNCiAgZ2VvbV90ZXh0KHN0YXQgPSAiY291bnQiLCBhZXMobGFiZWwgPSAuLmNvdW50Li4pLCBwb3NpdGlvbiA9IHBvc2l0aW9uX2RvZGdlKHdpZHRoID0gMC45KSwgdmp1c3QgPSAtMC41KSArDQogIGxhYnModGl0bGUgPSAiU3Vydml2YWwgUmF0ZSBieSB0aGUgQ2F1c2Ugb2YgdGhlIEV4Y3Vyc2lvbiIsDQogICAgICAgeCA9ICJTdGF0dXMiLA0KICAgICAgIHkgPSAiQ3JldyBNZW1iZXIgb3IgUGFzc2VuZ2VyIikgKw0KICB0aGVtZV9taW5pbWFsKCkNCg0KYGBgDQojIGNvcmVsYXRpb24gZXN0aW1hdGlvbg0KYGBge3J9DQoNCmNsZWFuU3RhdHVzJENyZXcub3IuUGFzc2VuZ2VyX2JpbmFyeSA8LSBpZmVsc2UoY2xlYW5TdGF0dXMkQ3Jldy5vci5QYXNzZW5nZXIgPT0gIlBhc3NlbmdlciIsIDEsIDApDQoNCnN0YXR1c19DcmV3Lm9yLlBhc3NlbmdlciA8LSB0YWJsZShjbGVhblN0YXR1cyRzdGF0dXNfYmluYXJ5LCBjbGVhblN0YXR1cyRnZW5kZXJfYmluYXJ5KQ0KDQpwaGlfY29lZmZpY2llbnQgPC0gY29yKGNsZWFuU3RhdHVzJHN0YXR1c19iaW5hcnksIGNsZWFuU3RhdHVzJENyZXcub3IuUGFzc2VuZ2VyX2JpbmFyeSkNCg0KcHJpbnQocGhpX2NvZWZmaWNpZW50KQ0KDQoNCmBgYA0KQSBjb3JyZWxhdGlvbiBjb2VmZmljaWVudCBvZiAwLjE1IHNob3dzIGEgd2VhayBwb3NpdGl2ZSByZWxhdGlvbnNoaXAsIG1lYW5pbmcgcGFzc2VuZ2VycyB3ZXJlIHNsaWdodGx5IG1vcmUgbGlrZWx5IHRvIHN1cnZpdmUgdGhhbiBjcmV3IG1lbWJlcnMgb24gdGhlIFRpdGFuaWMuIA0KDQojIFN1cnZpdmFsIGJhc2VkIG9uIGNsYXNzDQoNCmBgYHtyfQ0KZ2dwbG90KGNsZWFuU3RhdHVzLCBhZXMoeCA9IENsYXNzLi4uRGVwYXJ0bWVudCwgZmlsbCA9IFN0YXR1cykpICsNCiAgZ2VvbV9iYXIocG9zaXRpb24gPSAiZG9kZ2UiKSArDQogIHRoZW1lKGF4aXMudGV4dC54ID0gZWxlbWVudF90ZXh0KGFuZ2xlID0gNDUsIGhqdXN0ID0gMSkpICsNCiAgZ2VvbV90ZXh0KHN0YXQgPSAiY291bnQiLCBhZXMobGFiZWwgPSAuLmNvdW50Li4pLCBwb3NpdGlvbiA9IHBvc2l0aW9uX2RvZGdlKHdpZHRoID0gMC45KSwgdmp1c3QgPSAtMC41KSArDQogIGxhYnModGl0bGUgPSAiU3Vydml2YWwgUmF0ZSBieSBjbGFzcyBhbmQgZGVwYXJ0bWVudCBvZiBwYXNzZW5nZXJzIiwNCiAgICAgICB4ID0gIlN0YXR1cyIsDQogICAgICAgeSA9ICJDb3VudCIsDQogICAgICAgZmlsbCA9ICJDbGFzcyBEZXBhcnRtZW50IikgDQpgYGANCg0KIHdlIHNpbmdsZSBvdXQgb25seSBwYXNzZW5nZXJzIG9mIHNoaXANCiANCmBgYHtyfQ0KDQpvbmx5UGFzc2VuZ2VycyA8LSBjbGVhblN0YXR1c1shY2xlYW5TdGF0dXMkQ3Jldy5vci5QYXNzZW5nZXIuID09ICJDcmV3IiwgXQ0KDQpnZ3Bsb3Qob25seVBhc3NlbmdlcnMsIGFlcyh4ID0gQ2xhc3MuLi5EZXBhcnRtZW50LCBmaWxsID0gU3RhdHVzKSkgKw0KICBnZW9tX2Jhcihwb3NpdGlvbiA9ICJkb2RnZSIpICsNCiAgdGhlbWUoYXhpcy50ZXh0LnggPSBlbGVtZW50X3RleHQoYW5nbGUgPSA0NSwgaGp1c3QgPSAxKSkgKw0KICBnZW9tX3RleHQoc3RhdCA9ICJjb3VudCIsIGFlcyhsYWJlbCA9IC4uY291bnQuLiksIHBvc2l0aW9uID0gcG9zaXRpb25fZG9kZ2Uod2lkdGggPSAwLjkpLCB2anVzdCA9IC0wLjUpICsNCiAgbGFicyh0aXRsZSA9ICJTdXJ2aXZhbCBSYXRlIGJ5IGNsYXNzIGFuZCBkZXBhcnRtZW50IG9mIHBhc3NlbmdlcnMiLA0KICAgICAgIHggPSAiU3RhdHVzIiwNCiAgICAgICB5ID0gIkNvdW50IiwNCiAgICAgICBmaWxsID0gIkNsYXNzIERlcGFydG1lbnQiKSANCg0KDQpgYGANCg0KYGBge3J9DQoNCg0KDQoNCg0KYGBgDQoNCg0K