This report explores the Titanic Dataset downloaded from Kaggle at the following Link.
First look into the dataset (summary, structure,…):
# structure
str(titanic_data)
## 'data.frame': 1310 obs. of 14 variables:
## $ pclass : int 1 1 1 1 1 1 1 1 1 1 ...
## $ survived : int 1 1 0 0 0 1 1 0 1 0 ...
## $ name : chr "Allen, Miss. Elisabeth Walton" "Allison, Master. Hudson Trevor" "Allison, Miss. Helen Loraine" "Allison, Mr. Hudson Joshua Creighton" ...
## $ sex : chr "female" "male" "female" "male" ...
## $ age : num 29 0.917 2 30 25 ...
## $ sibsp : int 0 1 1 1 1 0 1 0 2 0 ...
## $ parch : int 0 2 2 2 2 0 0 0 0 0 ...
## $ ticket : chr "24160" "113781" "113781" "113781" ...
## $ fare : num 211 152 152 152 152 ...
## $ cabin : chr "B5" "C22 C26" "C22 C26" "C22 C26" ...
## $ embarked : chr "S" "S" "S" "S" ...
## $ boat : chr "2" "11" "" "" ...
## $ body : int NA NA NA 135 NA NA NA NA NA 22 ...
## $ home.dest: chr "St Louis, MO" "Montreal, PQ / Chesterville, ON" "Montreal, PQ / Chesterville, ON" "Montreal, PQ / Chesterville, ON" ...
# summary
summary(titanic_data)
## pclass survived name sex
## Min. :1.000 Min. :0.000 Length:1310 Length:1310
## 1st Qu.:2.000 1st Qu.:0.000 Class :character Class :character
## Median :3.000 Median :0.000 Mode :character Mode :character
## Mean :2.295 Mean :0.382
## 3rd Qu.:3.000 3rd Qu.:1.000
## Max. :3.000 Max. :1.000
## NA's :1 NA's :1
## age sibsp parch ticket
## Min. : 0.1667 Min. :0.0000 Min. :0.000 Length:1310
## 1st Qu.:21.0000 1st Qu.:0.0000 1st Qu.:0.000 Class :character
## Median :28.0000 Median :0.0000 Median :0.000 Mode :character
## Mean :29.8811 Mean :0.4989 Mean :0.385
## 3rd Qu.:39.0000 3rd Qu.:1.0000 3rd Qu.:0.000
## Max. :80.0000 Max. :8.0000 Max. :9.000
## NA's :264 NA's :1 NA's :1
## fare cabin embarked boat
## Min. : 0.000 Length:1310 Length:1310 Length:1310
## 1st Qu.: 7.896 Class :character Class :character Class :character
## Median : 14.454 Mode :character Mode :character Mode :character
## Mean : 33.295
## 3rd Qu.: 31.275
## Max. :512.329
## NA's :2
## body home.dest
## Min. : 1.0 Length:1310
## 1st Qu.: 72.0 Class :character
## Median :155.0 Mode :character
## Mean :160.8
## 3rd Qu.:256.0
## Max. :328.0
## NA's :1189
# data frame dimensions
rows <- nrow(titanic_data)
cols <- ncol(titanic_data)
# pre-look
head(titanic_data)
## pclass survived name sex
## 1 1 1 Allen, Miss. Elisabeth Walton female
## 2 1 1 Allison, Master. Hudson Trevor male
## 3 1 0 Allison, Miss. Helen Loraine female
## 4 1 0 Allison, Mr. Hudson Joshua Creighton male
## 5 1 0 Allison, Mrs. Hudson J C (Bessie Waldo Daniels) female
## 6 1 1 Anderson, Mr. Harry male
## age sibsp parch ticket fare cabin embarked boat body
## 1 29.0000 0 0 24160 211.3375 B5 S 2 NA
## 2 0.9167 1 2 113781 151.5500 C22 C26 S 11 NA
## 3 2.0000 1 2 113781 151.5500 C22 C26 S NA
## 4 30.0000 1 2 113781 151.5500 C22 C26 S 135
## 5 25.0000 1 2 113781 151.5500 C22 C26 S NA
## 6 48.0000 0 0 19952 26.5500 E12 S 3 NA
## home.dest
## 1 St Louis, MO
## 2 Montreal, PQ / Chesterville, ON
## 3 Montreal, PQ / Chesterville, ON
## 4 Montreal, PQ / Chesterville, ON
## 5 Montreal, PQ / Chesterville, ON
## 6 New York, NY
Data frame consists of 14 variables or columns and 1310 measurements or rows (different diamonds).
In this report we try to explore the given titanic dataset and answer questions such as
More passengers died than survived. Those who survived, the majority were females and those who died, the majority were males.
## Warning: Removed 264 rows containing non-finite outside the scale range
## (`stat_boxplot()`).
The median age of the 1st, 2nd and 3rd class passgengets was 39, 29 and 24. Higher degree of the passengers were young adults i.e. in their 20s or 30s in the 3rd class. 3rd class also had more outliers (older people) as compared to 1st and 2nd class
## Warning: Removed 1 row containing non-finite outside the scale range
## (`stat_count()`).
1st class had more passengers that survived than died. Majority of the people in 3rd class perished
Passengers who arrived via C (Cherbourg), more survived than died. Passengers who arrived via Q (Queenstown) and S (Southampton), more died than survived
## Warning: Removed 265 rows containing non-finite outside the scale range
## (`stat_smooth()`).
## Warning: Removed 265 rows containing missing values or values outside the scale range
## (`geom_point()`).
This graph shows there is weak positive correlation between age and fare. As the age goes up, fare goes up.