Background

This markdown was made to solve an assessment test for the course Data Science: Data Visualization by Harvard University on EDX

Context: The Titanic was a British ocean liner that struck an iceberg and sunk on its maiden voyage in 1912 from the United Kingdom to New York. More than 1,500 of the estimated 2,224 passengers and crew died in the accident, making this one of the largest maritime disasters ever outside of war. The ship carried a wide range of passengers of all ages and both genders, from luxury travelers in first-class to immigrants in the lower classes. However, not all passengers were equally likely to survive the accident. We use real data about a selection of 891 passengers to learn who was on the Titanic and which passengers were more likely to survive.

Loading the required packages

library(tidyverse)
library(openintro)
library(titanic)
library(knitr)

Getting and setting up the required data

titanic <- titanic_train %>%
    select(Survived, Pclass, Sex, Age, SibSp, Parch, Fare) %>%
    mutate(Survived = factor(Survived),
           Pclass = factor(Pclass),
           Sex = factor(Sex))

Initial exploration of the datset

Exercise 1

Inspecting the variable types. Look up ?titanic_train for info on the variables

glimpse(titanic)
## Rows: 891
## Columns: 7
## $ Survived <fct> 0, 1, 1, 1, 0, 0, 0, 0, 1, 1, 1, 1, 0, 0, 0, 1, 0, 1, 0, 1...
## $ Pclass   <fct> 3, 1, 3, 1, 3, 3, 1, 3, 3, 2, 3, 1, 3, 3, 3, 2, 3, 2, 3, 3...
## $ Sex      <fct> male, female, female, female, male, male, male, male, fema...
## $ Age      <dbl> 22, 38, 26, 35, 35, NA, 54, 2, 27, 14, 4, 58, 20, 39, 14, ...
## $ SibSp    <int> 1, 1, 0, 1, 0, 0, 0, 3, 0, 1, 1, 0, 0, 1, 0, 0, 4, 0, 1, 0...
## $ Parch    <int> 0, 0, 0, 0, 0, 0, 0, 1, 2, 0, 1, 0, 0, 5, 0, 0, 1, 0, 0, 0...
## $ Fare     <dbl> 7.2500, 71.2833, 7.9250, 53.1000, 8.0500, 8.4583, 51.8625,...

Exercise 2

Exploring the demographics of Titanic passengers

Overall age distribution of Titanic passengers

titanic%>%
  ggplot(aes(x=Age, color=Sex))+
  geom_density()

Counts of male vs female passengers

titanic %>%
  count(Sex) %>%
  knitr::kable()
Sex n
female 314
male 577

Now, checking the claim “The proportion of females under age 17 was higher than the proportion of males under age 17.

titanic %>%
  filter(!is.na(Age)) %>%
  mutate(under_17=if_else(Age<17,1,0)) %>%
  group_by(Sex) %>%
  summarise(mean(under_17)) %>%
  knitr::kable()
Sex mean(under_17)
female 0.1877395
male 0.1125828

The claim was indeed correct

Checking the second claim “The proportion of males age 18-35 was higher than the proportion of females age 18-35.

titanic %>%
  filter(!is.na(Age)) %>%
  mutate(Eighteen_thirtyfive=if_else(Age>=18 & Age<=35,1,0)) %>%
  group_by(Sex) %>%
  summarise(mean(Eighteen_thirtyfive)) %>%
  knitr::kable()
Sex mean(Eighteen_thirtyfive)
female 0.5095785
male 0.5540839

The claim was also correct

Exercise 3

Question about qq-plot.

params <- titanic %>%
    filter(!is.na(Age)) %>%
    summarize(mean = mean(Age), sd = sd(Age))

ggplot(titanic, aes(sample=Age))+
  geom_qq(dparams = params)+
  geom_abline()
## Warning: Removed 177 rows containing non-finite values (stat_qq).

Exploring survivality by different factors

Exercise 4

Survival by sex of passengers in the Titanic

ggplot(titanic, aes(x=Survived, fill=Sex))+
  geom_bar(position = "dodge")

Survival by age

ggplot(titanic, aes(x=Age, fill=Survived))+
  geom_density(alpha=0.2, aes(y=after_stat(count)))

Exercise 5

Survival by fare

titanic %>%
  filter(!Fare==0) %>%
  ggplot(aes(x=Survived, y=Fare))+
  geom_boxplot()+
  geom_jitter(alpha=0.2, width = 0.02)+
  scale_y_log10()

Exercise 6

Survival by passenger class

We’ll be creating three barplots to answer three different questions

ggplot(titanic, aes(x=Pclass, fill=Survived))+
  geom_bar()
Bar plots of passenger class filled by Survival with counts on the y-axis

Bar plots of passenger class filled by Survival with counts on the y-axis

ggplot(titanic, aes(x=Pclass, fill=Survived))+
  geom_bar(position = position_fill())
Bar plots of passenger class filled by Survival with proportion on the y-axis

Bar plots of passenger class filled by Survival with proportion on the y-axis

ggplot(titanic, aes(x=Survived, fill=Pclass))+
  geom_bar(position = position_fill())
Proportion bar plot

Proportion bar plot

Exercise 7

Survival by Age, Sex and Passenger Class

ggplot(titanic, aes(Age, fill=Survived))+
  geom_density(aes(y=after_stat(count)), alpha=0.4)+
  facet_grid(Sex~Pclass)

LS0tDQp0aXRsZTogIkFuYWx5emluZyBUaXRhbmljIGRhdGFzZXQiDQphdXRob3I6ICJBc2lmIEVuYW4iDQpkYXRlOiAiYHIgU3lzLkRhdGUoKWAiDQpvdXRwdXQ6IG9wZW5pbnRybzo6bGFiX3JlcG9ydA0KLS0tDQoNCiMjIyAqKkJhY2tncm91bmQqKg0KDQpUaGlzIG1hcmtkb3duIHdhcyBtYWRlIHRvIHNvbHZlIGFuIGFzc2Vzc21lbnQgdGVzdCBmb3IgdGhlIGNvdXJzZSBEYXRhIFNjaWVuY2U6IERhdGEgVmlzdWFsaXphdGlvbiBieSAqKkhhcnZhcmQgVW5pdmVyc2l0eSoqIG9uIFtFRFhdKGh0dHBzOi8vd3d3LmVkeC5vcmcvY291cnNlL2RhdGEtc2NpZW5jZS12aXN1YWxpemF0aW9uKQ0KDQoqKkNvbnRleHQ6KiogVGhlIFRpdGFuaWMgd2FzIGEgQnJpdGlzaCBvY2VhbiBsaW5lciB0aGF0IHN0cnVjayBhbiBpY2ViZXJnIGFuZCBzdW5rIG9uIGl0cyBtYWlkZW4gdm95YWdlIGluIDE5MTIgZnJvbSB0aGUgVW5pdGVkIEtpbmdkb20gdG8gTmV3IFlvcmsuIE1vcmUgdGhhbiAxLDUwMCBvZiB0aGUgZXN0aW1hdGVkIDIsMjI0IHBhc3NlbmdlcnMgYW5kIGNyZXcgZGllZCBpbiB0aGUgYWNjaWRlbnQsIG1ha2luZyB0aGlzIG9uZSBvZiB0aGUgbGFyZ2VzdCBtYXJpdGltZSBkaXNhc3RlcnMgZXZlciBvdXRzaWRlIG9mIHdhci4gVGhlIHNoaXAgY2FycmllZCBhIHdpZGUgcmFuZ2Ugb2YgcGFzc2VuZ2VycyBvZiBhbGwgYWdlcyBhbmQgYm90aCBnZW5kZXJzLCBmcm9tIGx1eHVyeSB0cmF2ZWxlcnMgaW4gZmlyc3QtY2xhc3MgdG8gaW1taWdyYW50cyBpbiB0aGUgbG93ZXIgY2xhc3Nlcy4gSG93ZXZlciwgbm90IGFsbCBwYXNzZW5nZXJzIHdlcmUgZXF1YWxseSBsaWtlbHkgdG8gc3Vydml2ZSB0aGUgYWNjaWRlbnQuIFdlIHVzZSByZWFsIGRhdGEgYWJvdXQgYSBzZWxlY3Rpb24gb2YgODkxIHBhc3NlbmdlcnMgdG8gbGVhcm4gd2hvIHdhcyBvbiB0aGUgVGl0YW5pYyBhbmQgd2hpY2ggcGFzc2VuZ2VycyB3ZXJlIG1vcmUgbGlrZWx5IHRvIHN1cnZpdmUuDQoNCkxvYWRpbmcgdGhlIHJlcXVpcmVkIHBhY2thZ2VzDQpgYGB7ciBsb2FkLXBhY2thZ2VzLCBtZXNzYWdlPUZBTFNFLCB3YXJuaW5nPUZBTFNFfQ0KbGlicmFyeSh0aWR5dmVyc2UpDQpsaWJyYXJ5KG9wZW5pbnRybykNCmxpYnJhcnkodGl0YW5pYykNCmxpYnJhcnkoa25pdHIpDQpgYGANCg0KDQpHZXR0aW5nIGFuZCBzZXR0aW5nIHVwIHRoZSByZXF1aXJlZCBkYXRhIA0KDQpgYGB7ciBkYXRhfQ0KdGl0YW5pYyA8LSB0aXRhbmljX3RyYWluICU+JQ0KICAgIHNlbGVjdChTdXJ2aXZlZCwgUGNsYXNzLCBTZXgsIEFnZSwgU2liU3AsIFBhcmNoLCBGYXJlKSAlPiUNCiAgICBtdXRhdGUoU3Vydml2ZWQgPSBmYWN0b3IoU3Vydml2ZWQpLA0KICAgICAgICAgICBQY2xhc3MgPSBmYWN0b3IoUGNsYXNzKSwNCiAgICAgICAgICAgU2V4ID0gZmFjdG9yKFNleCkpDQpgYGANCg0KDQojIyAqKkluaXRpYWwgZXhwbG9yYXRpb24gb2YgdGhlIGRhdHNldCoqDQoNCiMjIyAqKkV4ZXJjaXNlIDEqKg0KDQpJbnNwZWN0aW5nIHRoZSAqKnZhcmlhYmxlIHR5cGVzKiouIExvb2sgdXAgYD90aXRhbmljX3RyYWluYCBmb3IgaW5mbyBvbiB0aGUgdmFyaWFibGVzDQoNCmBgYHtyIHZhcmlhYmxlIHR5cGVzfQ0KZ2xpbXBzZSh0aXRhbmljKQ0KYGBgDQoNCg0KIyMjICoqRXhlcmNpc2UgMioqDQoNCioqRXhwbG9yaW5nIHRoZSBkZW1vZ3JhcGhpY3Mgb2YgVGl0YW5pYyBwYXNzZW5nZXJzKioNCg0KDQpPdmVyYWxsIGFnZSBkaXN0cmlidXRpb24gb2YgVGl0YW5pYyBwYXNzZW5nZXJzDQpgYGB7ciBkZW5zaXR5LCB3YXJuaW5nPUZBTFNFfQ0KdGl0YW5pYyU+JQ0KICBnZ3Bsb3QoYWVzKHg9QWdlLCBjb2xvcj1TZXgpKSsNCiAgZ2VvbV9kZW5zaXR5KCkNCmBgYA0KDQpDb3VudHMgb2YgbWFsZSB2cyBmZW1hbGUgcGFzc2VuZ2Vycw0KYGBge3IgY291bnRzfQ0KdGl0YW5pYyAlPiUNCiAgY291bnQoU2V4KSAlPiUNCiAga25pdHI6OmthYmxlKCkNCmBgYA0KDQoNCioqTm93LCoqIGNoZWNraW5nIHRoZSBjbGFpbSAiKlRoZSBwcm9wb3J0aW9uIG9mIGZlbWFsZXMgdW5kZXIgYWdlIDE3IHdhcyBoaWdoZXIgdGhhbiB0aGUgcHJvcG9ydGlvbiBvZiBtYWxlcyB1bmRlciBhZ2UgMTcuKiINCg0KYGBge3IgY2xhaW0gMX0NCnRpdGFuaWMgJT4lDQogIGZpbHRlcighaXMubmEoQWdlKSkgJT4lDQogIG11dGF0ZSh1bmRlcl8xNz1pZl9lbHNlKEFnZTwxNywxLDApKSAlPiUNCiAgZ3JvdXBfYnkoU2V4KSAlPiUNCiAgc3VtbWFyaXNlKG1lYW4odW5kZXJfMTcpKSAlPiUNCiAga25pdHI6OmthYmxlKCkNCiAgDQpgYGANCg0KVGhlIGNsYWltIHdhcyBpbmRlZWQgKipjb3JyZWN0KioNCg0KDQpDaGVja2luZyB0aGUgc2Vjb25kIGNsYWltICIqVGhlIHByb3BvcnRpb24gb2YgbWFsZXMgYWdlIDE4LTM1IHdhcyBoaWdoZXIgdGhhbiB0aGUgcHJvcG9ydGlvbiBvZiBmZW1hbGVzIGFnZSAxOC0zNS4qIg0KYGBge3IgY2xhaW0gMn0NCnRpdGFuaWMgJT4lDQogIGZpbHRlcighaXMubmEoQWdlKSkgJT4lDQogIG11dGF0ZShFaWdodGVlbl90aGlydHlmaXZlPWlmX2Vsc2UoQWdlPj0xOCAmIEFnZTw9MzUsMSwwKSkgJT4lDQogIGdyb3VwX2J5KFNleCkgJT4lDQogIHN1bW1hcmlzZShtZWFuKEVpZ2h0ZWVuX3RoaXJ0eWZpdmUpKSAlPiUNCiAga25pdHI6OmthYmxlKCkNCmBgYA0KDQoNClRoZSBjbGFpbSB3YXMgYWxzbyAqKmNvcnJlY3QqKg0KDQoNCiMjIyAqKkV4ZXJjaXNlIDMqKg0KDQoqKlF1ZXN0aW9uIGFib3V0IHFxLXBsb3QuKioNCmBgYHtyIHFxcGxvdH0NCnBhcmFtcyA8LSB0aXRhbmljICU+JQ0KICAgIGZpbHRlcighaXMubmEoQWdlKSkgJT4lDQogICAgc3VtbWFyaXplKG1lYW4gPSBtZWFuKEFnZSksIHNkID0gc2QoQWdlKSkNCg0KZ2dwbG90KHRpdGFuaWMsIGFlcyhzYW1wbGU9QWdlKSkrDQogIGdlb21fcXEoZHBhcmFtcyA9IHBhcmFtcykrDQogIGdlb21fYWJsaW5lKCkNCmBgYA0KDQojIyAqKkV4cGxvcmluZyBzdXJ2aXZhbGl0eSBieSBkaWZmZXJlbnQgZmFjdG9ycyoqDQoNCiMjIyAqKkV4ZXJjaXNlIDQqKg0KDQpTdXJ2aXZhbCBieSAqKnNleCBvZiBwYXNzZW5nZXJzKiogaW4gdGhlIFRpdGFuaWMNCg0KDQpgYGB7ciBjb3VudC1jb21wYXJlfQ0KZ2dwbG90KHRpdGFuaWMsIGFlcyh4PVN1cnZpdmVkLCBmaWxsPVNleCkpKw0KICBnZW9tX2Jhcihwb3NpdGlvbiA9ICJkb2RnZSIpDQoNCmBgYA0KDQoNClN1cnZpdmFsIGJ5ICoqYWdlKioNCg0KYGBge3IsIHdhcm5pbmc9RkFMU0V9DQpnZ3Bsb3QodGl0YW5pYywgYWVzKHg9QWdlLCBmaWxsPVN1cnZpdmVkKSkrDQogIGdlb21fZGVuc2l0eShhbHBoYT0wLjIsIGFlcyh5PWFmdGVyX3N0YXQoY291bnQpKSkNCmBgYA0KDQoNCg0KIyMjICoqRXhlcmNpc2UgNSoqDQoNClN1cnZpdmFsIGJ5ICoqZmFyZSoqDQpgYGB7ciBzdXJ2aXZhbCBieSBmYXJlfQ0KdGl0YW5pYyAlPiUNCiAgZmlsdGVyKCFGYXJlPT0wKSAlPiUNCiAgZ2dwbG90KGFlcyh4PVN1cnZpdmVkLCB5PUZhcmUpKSsNCiAgZ2VvbV9ib3hwbG90KCkrDQogIGdlb21faml0dGVyKGFscGhhPTAuMiwgd2lkdGggPSAwLjAyKSsNCiAgc2NhbGVfeV9sb2cxMCgpDQpgYGANCg0KDQojIyMgKipFeGVyY2lzZSA2KioNCg0KU3Vydml2YWwgYnkgKipwYXNzZW5nZXIgY2xhc3MqKg0KDQpXZSdsbCBiZSBjcmVhdGluZyB0aHJlZSBiYXJwbG90cyB0byBhbnN3ZXIgdGhyZWUgZGlmZmVyZW50IHF1ZXN0aW9ucw0KDQpgYGB7ciwgZmlnLmNhcD0gIkJhciBwbG90cyBvZiBwYXNzZW5nZXIgY2xhc3MgZmlsbGVkIGJ5IFN1cnZpdmFsIHdpdGggY291bnRzIG9uIHRoZSB5LWF4aXMifQ0KZ2dwbG90KHRpdGFuaWMsIGFlcyh4PVBjbGFzcywgZmlsbD1TdXJ2aXZlZCkpKw0KICBnZW9tX2JhcigpDQpgYGANCmBgYHtyLCBmaWcuY2FwPSAiQmFyIHBsb3RzIG9mIHBhc3NlbmdlciBjbGFzcyBmaWxsZWQgYnkgU3Vydml2YWwgd2l0aCBwcm9wb3J0aW9uIG9uIHRoZSB5LWF4aXMifQ0KZ2dwbG90KHRpdGFuaWMsIGFlcyh4PVBjbGFzcywgZmlsbD1TdXJ2aXZlZCkpKw0KICBnZW9tX2Jhcihwb3NpdGlvbiA9IHBvc2l0aW9uX2ZpbGwoKSkNCmBgYA0KYGBge3IgU3Vydml2YWwgY2xhc3MsIGZpZy5jYXA9IlByb3BvcnRpb24gYmFyIHBsb3QifQ0KZ2dwbG90KHRpdGFuaWMsIGFlcyh4PVN1cnZpdmVkLCBmaWxsPVBjbGFzcykpKw0KICBnZW9tX2Jhcihwb3NpdGlvbiA9IHBvc2l0aW9uX2ZpbGwoKSkNCmBgYA0KDQoNCg0KDQojIyMgKipFeGVyY2lzZSA3ICoqDQoNCioqU3Vydml2YWwgYnkgQWdlLCBTZXggYW5kIFBhc3NlbmdlciBDbGFzcyoqDQoNCmBgYHtyLCB3YXJuaW5nPUZBTFNFfQ0KZ2dwbG90KHRpdGFuaWMsIGFlcyhBZ2UsIGZpbGw9U3Vydml2ZWQpKSsNCiAgZ2VvbV9kZW5zaXR5KGFlcyh5PWFmdGVyX3N0YXQoY291bnQpKSwgYWxwaGE9MC40KSsNCiAgZmFjZXRfZ3JpZChTZXh+UGNsYXNzKQ0KYGBgDQo=